All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Functions
Eegeo::Unicode::Utf8 Namespace Reference

Utf-8 Unicode character encoding/decoding. More...

Functions

u32 Next (std::string::const_iterator &iter, std::string::const_iterator iterEnd)
 Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character. More...
 
u32 Next (const char *&iter, const char *iterEnd)
 Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character. More...
 
size_t GetCodepointCount (std::string::const_iterator iter, std::string::const_iterator iterEnd)
 
size_t GetCodepointCount (const char *iter, const char *iterEnd)
 
size_t Strnlen (const char *s, size_t size)
 
u32 PeekNext (std::string::const_iterator iter, std::string::const_iterator iterEnd)
 Decodes the next Unicode codepoint from a Utf-8 sequence iterator. More...
 
u32 PeekNext (const char *iter, const char *iterEnd)
 Decodes the next Unicode codepoint from a Utf-8 sequence iterator. More...
 
template<typename TUtf32Iterator >
TUtf32Iterator ToUtf32 (std::string::const_iterator inputIter, std::string::const_iterator inputIterEnd, TUtf32Iterator outputIter)
 Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter. More...
 
template<typename TUtf32Iterator >
TUtf32Iterator ToUtf32 (const char *inputIter, const char *inputIterEnd, TUtf32Iterator outputIter)
 Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter. More...
 

Detailed Description

Utf-8 Unicode character encoding/decoding.

Function Documentation

size_t Eegeo::Unicode::Utf8::GetCodepointCount ( std::string::const_iterator  iter,
std::string::const_iterator  iterEnd 
)

Counts the number of Unicode codepoints in a Utf-8 sequence range

Parameters
iterstd::string const iterator pointing to the start of a Utf-8 sequence
iterEndstd::string const iterator pointing to the element one past the end of the input sequence range
Returns
Return the number of Utf-32 codepoints decoded from the input sequence
size_t Eegeo::Unicode::Utf8::GetCodepointCount ( const char *  iter,
const char *  iterEnd 
)

Counts the number of Unicode codepoints in a Utf-8 sequence range

Parameters
iterpointer to the start of a Utf-8 sequence
iterEndpointer to the element one past the end of the input sequence range
Returns
Return the number of Utf-32 codepoints decoded from the input sequence
u32 Eegeo::Unicode::Utf8::Next ( std::string::const_iterator &  iter,
std::string::const_iterator  iterEnd 
)

Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character.

If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element

Parameters
iterstd::string const iterator pointing to the start of a Utf-8 string
iterEndstd::string const iterator pointing to one element past the end of the string
Returns
the Utf-32 codepoint decoded from iter
u32 Eegeo::Unicode::Utf8::Next ( const char *&  iter,
const char *  iterEnd 
)

Decodes the next Unicode codepoint from a Utf-8 sequence iterator, and increments the iterator to one byte past the end of octet sequence for the decoded character.

If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element

Parameters
iterpointer to the start of a Utf-8 string
iterEndpointer to the element one past the end of the inputIter string
Returns
the Utf-32 codepoint decoded from iter
u32 Eegeo::Unicode::Utf8::PeekNext ( std::string::const_iterator  iter,
std::string::const_iterator  iterEnd 
)
inline

Decodes the next Unicode codepoint from a Utf-8 sequence iterator.

If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element.

Parameters
iterstd::string const iterator pointing to the start of a Utf-8 string
iterEndstd::string const iterator pointing to one element past the end of the string
Returns
the Utf-32 codepoint decoded from iter
u32 Eegeo::Unicode::Utf8::PeekNext ( const char *  iter,
const char *  iterEnd 
)
inline

Decodes the next Unicode codepoint from a Utf-8 sequence iterator.

If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is returned, and iter incremented by one element

Parameters
iterpointer to the start of a Utf-8 string
iterEndpointer to the element one past the end of the inputIter string
Returns
the Utf-32 codepoint decoded from iter
size_t Eegeo::Unicode::Utf8::Strnlen ( const char *  s,
size_t  size 
)

Counts the number of Unicode codepoints in a Utf-8 encoded string

Parameters
sa Utf-8 encoded byte sequence
sizein bytes of buffer s
Returns
Return the number of Utf32 codepoints decoded from s up to but not including the first null character
template<typename TUtf32Iterator >
TUtf32Iterator Eegeo::Unicode::Utf8::ToUtf32 ( std::string::const_iterator  inputIter,
std::string::const_iterator  inputIterEnd,
TUtf32Iterator  outputIter 
)
inline

Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter.

Example:

std::string utf8String;
std::vector<u32> utf32String;
Utf8::ToUtf32(utf8String.begin(), utf8String.end(), back_inserter(utf32String);

If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is inserted on output, and inputIter incremented by one element before continuing

Parameters
inputIterstd::string const iterator pointing to the start of a Utf-8 string
inputIterEndstd::string const iterator pointing to one element past the end of the inputIter string
outputIteroutput std::iterator onto which the resulting Utf-32 characters are inserted
Returns
the new value of outputIter
template<typename TUtf32Iterator >
TUtf32Iterator Eegeo::Unicode::Utf8::ToUtf32 ( const char *  inputIter,
const char *  inputIterEnd,
TUtf32Iterator  outputIter 
)
inline

Decodes Utf-8 sequence inputIter to a Utf-32 sequence, inserting results on outputIter.

Example:

const char utf8String[] = "Example";
const char* end = utf8String + strlen(utf8String);
std::vector<u32> utf32String;
Utf8::ToUtf32(utf8String, end, back_inserter(utf32String);

If an invalid octet sequence or codepoint is encountered on the input sequence, the replacement character U+FFFD is inserted on output, and inputIter incremented by one element before continuing

Parameters
inputIterpointer to the start of a Utf-8 string
inputIterEndpointer to the element one past the end of the inputIter string
outputIteroutput std::iterator onto which the resulting Utf-32 characters are inserted
Returns
the new value of outputIter