std::codecvt

From Cppreference

Jump to: navigation, search
Defined in header <locale>

template< class internT, class externT, class stateT >
class codecvt;

Class std::codecvt encapsulates conversion of character strings, including wide and multibyte, from one encoding to another. All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Four specializations are provided by the standard library and are implemented by all locale objects created in a C++ program:

Defined in header <locale>
std::codecvt<char, char, std::mbstate_t> identity conversion
std::codecvt<char16_t, char, std::mbstate_t> conversion between UTF-16 and UTF-8 (C++11 feature)
std::codecvt<char32_t, char, std::mbstate_t> conversion between UTF-32 and UTF-8 (C++11 feature)
std::codecvt<wchar_t, char, std::mbstate_t> locale-specific conversion between wide string and narrow, possibly multibyte, string

Contents

[edit] Member types

Member type Definition
intern_type internT
extern_type externT
state_type stateT

[edit] Member objects

Member name Type
id (static) std::locale::id

[edit] Member functions

(constructor)
constructs a new codecvt facet
(public member function)
(destructor)
destructs a codecvt facet
(protected member function)
Public member functions (pubic interface)
out
invokes do_out
(public member function)
in
invokes do_in
(public member function)
unshift
invokes do_unshift
(public member function)
encoding
invokes do_encoding
(public member function)
always_noconv
invokes do_always_noconv
(public member function)
length
invokes do_length
(public member function)
max_length
invokes do_max_length
(public member function)
Virtual member functions (can be overridden in a user-defined facet derived from codecvt
do_out [virtual]
converts a string from internT to externT, such as when writing to file
(virtual protected member function)
do_in [virtual]
converts a string from externT to internT, such as when reading from file
(virtual protected member function)
do_unshift [virtual]
generates the termination character sequence of externT characters for incomplete conversion
(virtual protected member function)
do_encoding [virtual]
returns the number of externT characters necessary to produce one internT character, if constant
(virtual protected member function)
do_always_noconv [virtual]
tests if the facet encodes an identity conversion for all valid argument values
(virtual protected member function)
do_length [virtual]
calculates the length of the externT string that would be consumed by conversion into given internT buffer
(virtual protected member function)
do_max_length [virtual]
returns the maximum number of externT characters that could be converted into a single internT character
(virtual protected member function)

Inherited from std::codecvt_base

Type Definition
result conversion status enumeration type, defining the values ok, partial, error, and noconv


[edit] Example

The following examples reads a UTF-8 file using a locale which implements UTF-8 conversion in codecvt<wchar_t, char, mbstate_t>

#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <iomanip>
int main()
{
    // UTF-8 narrow multibyte encoding
    std::ofstream("text.txt") << u8"z\u00df\u6c34\U0001d10b"; // or u8"zß水𝄋"
                                           // or "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9d\x84\x8b";
    std::wifstream fin("text.txt");
    fin.imbue(std::locale("en_US.UTF-8")); // this locale's codecvt<wchar_t, char, mbstate_t>
                                           // converts UTF-8 to UCS4
    std::cout << "The UTF-8 file contains the following wide characters: \n";
    for(wchar_t c; fin >> c; )
        std::cout << "U+" << std::hex << std::setw(4) << std::setfill('0') << c << '\n';
}

Output:

The UTF-8 file contains the following wide characters:
U+007a
U+00df
U+6c34
U+1d10b

[edit] See also

Character
conversions
narrow multibyte
(char)
UTF-8
(char)
UTF-16
(char16_t)
UTF-16 mbrtoc16 / c16rtombr codecvt<char16_t, char, mbstate_t>
codecvt_utf8_utf16<char16_t>
codecvt_utf8_utf16<char32_t>
codecvt_utf8_utf16<wchar_t>
N/A
UCS2 No codecvt_utf8<char16_t> codecvt_utf16<char16_t>
UTF-32/UCS4
(char32_t)
mbrtoc32 / c32rtombr codecvt<char32_t, char, mbstate_t>
codecvt_utf8<char32_t>
codecvt_utf16<char32_t>
UCS2/UCS4
(wchar_t)
No codecvt_utf8<wchar_t> codecvt_utf16<wchar_t>
wide
(wchar_t)
codecvt<wchar_t, char, mbstate_t>
mbstowcs / wcstombs
No No
codecvt_base
defines character conversion errors
(class template)
codecvt_byname
creates a codecvt facet for the named locale
(class template)
codecvt_utf8 (C++11)
converts between UTF-8 and UCS2/UCS4
(class template)
codecvt_utf16 (C++11)
converts between UTF-16 and UCS2/UCS4
(class template)
codecvt_utf8_utf16 (C++11)
converts between UTF-8 and UTF-16
(class template)