Read/Write UTF-8 characters in C++, in Windows.

(Note : Add <meta http-equiv='Content-Type' content='text/html; charset=utf-8' /> to your Markdown source so that UTF8 is properly rendered)

About two months ago, I met a friend for a coffee, to catch up and share news. (Hi Giannis!)
At some point he mentioned an application he’d been developing, and had run into an unexpected issue. He had been using C++ to manage a database, with contents encoded in UTF8 (greek characters).

The code worked just fine in Linux, but failed in the client’s Windows systems. Instead of displaying and using greek text properly, some Ìðïñþ íá ãñÜøù åëëçíéêïýò ÷áñáêôÞñ mumbo-jumbo appeared. As it happens in the real-world you can’t just change the client’s infrastructure to Linux or just install another C++ version, so he asked me if we could hack something fast.

Here’s the code, as a complete example, tested on MS Visual Studio 2017 Community Edition. (I hope GitHub handles UTF8 text). The code can correcty read/write/print UTF8 text, such as greek or cyrillic characters.

To save you some time, the main loc that make this work.

#include "stdafx.h"
#include <iostream>
#include <fstream>

#include <codecvt>
#include <fcntl.h>
#include <io.h>

...
...
...

	// Write file in UTF-8
	std::wofstream wof;
	wof.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::generate_header>));
	wof.open(L"example.txt");
	wof << L"This is a test. \n";
	wof << L"Μπορώ να γράψω ελληνικούς χαρακτήρες \n";
	wof << L"Ακόμα μια δοκιμή;;;;!!11?1 \n";
	wof.close();

...
...
...

	// Read file in UTF-8
	std::ifstream f("utf8text.txt");
	std::wbuffer_convert<std::codecvt_utf8<wchar_t>> conv(f.rdbuf());
	std::wistream wf(&conv);

...
...
...

	// Print string in UTF-8
#ifdef _WIN32
	_setmode(_fileno(stdout), _O_WTEXT);
#else
	std::setlocale(LC_ALL, "");
#endif 
	for (wchar_t c; wf.get(c); )
		std::wcout << c ;

What happened then?

The next day, after he made sure this code worked, he switched the project to C#.
C# natively supported UTF-8 and also allowed to quickly create both web and android apps for the main code.

Well, maybe someone else will be in the same position and might find it useful!

Written on October 12, 2017