ot
class CodeConverterBase
#include "ot/base/CodeConverterBase.h"
Base class that contains enums, values and static methods used by all CodeConverter sub-classes.
The primary motivation for the existence of this class is the CodeConverterBase::Result enum which represents the result of a conversion operation. Derived classes usually inherit from this class in order to gain visibility to this enum.
For convenience, this class also contains some important static methods for the encoding and decoding of sequences of CharType into Unicode code-point values (and vice versa).
Protected Static Data Members |
s_TrailingBytesForUTF8
const char s_TrailingBytesForUTF8
enum Result { |
ok, |
/* success, input buffer completely processed */ |
|
inputExhausted, |
/* success, incomplete input sequence detected */ |
|
outputExhausted, |
/* success, output buffer full */ |
|
error, |
/* conversion error */ |
|
noconv} |
/* no conversion required */ |
IsLegalUTF16
static bool IsLegalUTF16(const wchar_t* pStart,
size_t length)
-
Tests if the wchar_t sequence starting at pStart for length characters is a valid UTF-16 sequence representing a single Unicode character.
A UTF-16 sequence consists of either a single wchar_t value or a pair of values in the surrogate range.
- Parameters:
pStart
-
pointer to the first wchar_t of a UTF-16 sequence
length
-
length of the sequence to test
- Returns:
-
true if the sequence is valid UTF-16; false otherwise
IsLegalUTF8
static bool IsLegalUTF8(const Byte* pStart,
size_t length)
-
Tests if the multi-byte sequence starting at pStart for length bytes is a valid UTF-8 sequence representing a single Unicode character.
- Parameters:
pStart
-
pointer to the first Byte of a multi-byte UTF-8 sequence
length
-
length of the multi-byte sequence to test
- Returns:
-
true if the sequence is valid UTF-8; false otherwise
UTF8Decode
static Result UTF8Decode(UCS4Char& ch,
const Byte* from,
const Byte* from_end,
const Byte*& from_next)
-
Decodes a UTF-8 sequence of bytes into a single Unicode character.
- Parameters:
ch
-
returned code-point value for the decoded Unicode character
from
-
pointer to the start of the UTF-8 encoded input buffer
from_end
-
pointer to the end of the input buffer. In common with C++ standard library conventions, this must point at the next Byte position after the end of the input buffer
from_next
-
return parameter containing a pointer to the start of the next multi-byte sequence in the input buffer
- Returns:
-
One of the CodeConverterBase::Result values indicating the success of the operation.
UTF8Encode
static Result UTF8Encode(UCS4Char ch,
Byte* to,
const Byte* to_limit,
Byte*& to_next)
-
Encodes the single Unicode character ch into a UTF-8 Byte array.
- Parameters:
ch
-
the code-point value for the Unicode character to be encoded
to
-
pointer to the start of the output buffer
to_limit
-
pointer to the end of the output buffer. In common with C++ standard library conventions, this must point at the next Byte position after the end of the output buffer
to_next
-
return parameter containing a pointer to the first unused Byte position in the passed output buffer
- Returns:
-
One of the CodeConverterBase::Result values indicating the success of the operation.
Found a bug or missing feature? Please email us at support@elcel.com