Utf-8 Unicode character encoding/decoding.
More...
|
u32 | Next (std::string::const_iterator &iter, std::string::const_iterator iterEnd) |
| Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character. More...
|
|
u32 | Next (const char *&iter, const char *iterEnd) |
| Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character. More...
|
|
size_t | GetCodepointCount (std::string::const_iterator iter, std::string::const_iterator iterEnd) |
|
size_t | GetCodepointCount (const char *iter, const char *iterEnd) |
|
size_t | Strnlen (const char *s, size_t size) |
|
u32 | PeekNext (std::string::const_iterator iter, std::string::const_iterator iterEnd) |
| Decodes the next Unicode codepoint from a Utf-8 sequence iterator. More...
|
|
u32 | PeekNext (const char *iter, const char *iterEnd) |
| Decodes the next Unicode codepoint from a Utf-8 sequence iterator. More...
|
|
template<typename TUtf32Iterator > |
TUtf32Iterator | ToUtf32 (std::string::const_iterator inputIter, std::string::const_iterator inputIterEnd, TUtf32Iterator outputIter) |
| Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter. More...
|
|
template<typename TUtf32Iterator > |
TUtf32Iterator | ToUtf32 (const char *inputIter, const char *inputIterEnd, TUtf32Iterator outputIter) |
| Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter. More...
|
|
Utf-8 Unicode character encoding/decoding.
size_t Eegeo::Unicode::Utf8::GetCodepointCount |
( |
std::string::const_iterator |
iter, |
|
|
std::string::const_iterator |
iterEnd |
|
) |
| |
Counts the number of Unicode codepoints in a Utf-8 sequence range
- Parameters
-
iter | std::string const iterator pointing to the start of a Utf-8 sequence |
iterEnd | std::string const iterator pointing to the element one past the end of the input sequence range |
- Returns
- Return the number of Utf-32 codepoints decoded from the input sequence
size_t Eegeo::Unicode::Utf8::GetCodepointCount |
( |
const char * |
iter, |
|
|
const char * |
iterEnd |
|
) |
| |
Counts the number of Unicode codepoints in a Utf-8 sequence range
- Parameters
-
iter | pointer to the start of a Utf-8 sequence |
iterEnd | pointer to the element one past the end of the input sequence range |
- Returns
- Return the number of Utf-32 codepoints decoded from the input sequence
u32 Eegeo::Unicode::Utf8::Next |
( |
std::string::const_iterator & |
iter, |
|
|
std::string::const_iterator |
iterEnd |
|
) |
| |
Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character.
If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element
- Parameters
-
iter | std::string const iterator pointing to the start of a Utf-8 string |
iterEnd | std::string const iterator pointing to one element past the end of the string |
- Returns
- the Utf-32 codepoint decoded from iter
u32 Eegeo::Unicode::Utf8::Next |
( |
const char *& |
iter, |
|
|
const char * |
iterEnd |
|
) |
| |
Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character.
If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element
- Parameters
-
iter | pointer to the start of a Utf-8 string |
iterEnd | pointer to the element one past the end of the inputIter string |
- Returns
- the Utf-32 codepoint decoded from iter
u32 Eegeo::Unicode::Utf8::PeekNext |
( |
std::string::const_iterator |
iter, |
|
|
std::string::const_iterator |
iterEnd |
|
) |
| |
|
inline |
Decodes the next Unicode codepoint from a Utf-8 sequence iterator.
If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element.
- Parameters
-
iter | std::string const iterator pointing to the start of a Utf-8 string |
iterEnd | std::string const iterator pointing to one element past the end of the string |
- Returns
- the Utf-32 codepoint decoded from iter
u32 Eegeo::Unicode::Utf8::PeekNext |
( |
const char * |
iter, |
|
|
const char * |
iterEnd |
|
) |
| |
|
inline |
Decodes the next Unicode codepoint from a Utf-8 sequence iterator.
If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element
- Parameters
-
iter | pointer to the start of a Utf-8 string |
iterEnd | pointer to the element one past the end of the inputIter string |
- Returns
- the Utf-32 codepoint decoded from iter
size_t Eegeo::Unicode::Utf8::Strnlen |
( |
const char * |
s, |
|
|
size_t |
size |
|
) |
| |
Counts the number of Unicode codepoints in a Utf-8 encoded string
- Parameters
-
s | a Utf-8 encoded byte sequence |
size | in bytes of buffer s |
- Returns
- Return the number of Utf32 codepoints decoded from s up to but not including the first null character
template<typename TUtf32Iterator >
TUtf32Iterator Eegeo::Unicode::Utf8::ToUtf32 |
( |
std::string::const_iterator |
inputIter, |
|
|
std::string::const_iterator |
inputIterEnd, |
|
|
TUtf32Iterator |
outputIter |
|
) |
| |
|
inline |
Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter.
Example:
std::string utf8String;
std::vector<u32> utf32String;
Utf8::ToUtf32(utf8String.begin(), utf8String.end(), back_inserter(utf32String);
If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is inserted on output, and inputIter incremented by one element before continuing
- Parameters
-
inputIter | std::string const iterator pointing to the start of a Utf-8 string |
inputIterEnd | std::string const iterator pointing to one element past the end of the inputIter string |
outputIter | output std::iterator onto which the resulting Utf-32 characters are inserted |
- Returns
- the new value of outputIter
template<typename TUtf32Iterator >
TUtf32Iterator Eegeo::Unicode::Utf8::ToUtf32 |
( |
const char * |
inputIter, |
|
|
const char * |
inputIterEnd, |
|
|
TUtf32Iterator |
outputIter |
|
) |
| |
|
inline |
Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter.
Example:
const char utf8String[] = "Example";
const char* end = utf8String + strlen(utf8String);
std::vector<u32> utf32String;
Utf8::ToUtf32(utf8String, end, back_inserter(utf32String);
If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is inserted on output, and inputIter incremented by one element before continuing
- Parameters
-
inputIter | pointer to the start of a Utf-8 string |
inputIterEnd | pointer to the element one past the end of the inputIter string |
outputIter | output std::iterator onto which the resulting Utf-32 characters are inserted |
- Returns
- the new value of outputIter