Enca Library Reference Manual | |||
---|---|---|---|
<<< Previous Page | Home | Up | Next Page >>> |
struct EncaEncoding; #define ENCA_CS_UNKNOWN enum EncaSurface; enum EncaCharsetFlags; enum EncaNameStyle; enum EncaErrno; #define ENCA_NOT_A_CHAR |
struct EncaEncoding { int charset; EncaSurface surface; }; |
Encoding, i.e. charset and surface.
This is what enca_analyse() and enca_analyse_const() return.
The charset field is an opaque numerical charset identifier, which has no meaning outside Enca library. You will probably want to use it only as enca_charset_name() argument. It is only guaranteed not to change meaning during program execution time; change of its interpretation (e.g. due to addition of new charsets) is not considered API change.
The surface field is a combination of EncaSurface flags. You may want to ignore it completely; you should use enca_set_interpreted_surfaces() to disable weird surfaces then.
int charset | Numeric charset identifier. |
EncaSurface surface | Surface flags. |
#define ENCA_CS_UNKNOWN (-1) |
Unknown character set id.
Use enca_charset_is_known() to check for unknown charset instead of direct comparsion.
typedef enum { /*< flags >*/ ENCA_SURFACE_EOL_CR = 1 << 0, ENCA_SURFACE_EOL_LF = 1 << 1, ENCA_SURFACE_EOL_CRLF = 1 << 2, ENCA_SURFACE_EOL_MIX = 1 << 3, ENCA_SURFACE_EOL_BIN = 1 << 4, ENCA_SURFACE_MASK_EOL = (ENCA_SURFACE_EOL_CR | ENCA_SURFACE_EOL_LF | ENCA_SURFACE_EOL_CRLF | ENCA_SURFACE_EOL_MIX | ENCA_SURFACE_EOL_BIN), ENCA_SURFACE_PERM_21 = 1 << 5, ENCA_SURFACE_PERM_4321 = 1 << 6, ENCA_SURFACE_PERM_MIX = 1 << 7, ENCA_SURFACE_MASK_PERM = (ENCA_SURFACE_PERM_21 | ENCA_SURFACE_PERM_4321 | ENCA_SURFACE_PERM_MIX), ENCA_SURFACE_QP = 1 << 8, ENCA_SURFACE_REMOVE = 1 << 13, ENCA_SURFACE_UNKNOWN = 1 << 14, ENCA_SURFACE_MASK_ALL = (ENCA_SURFACE_MASK_EOL | ENCA_SURFACE_MASK_PERM | ENCA_SURFACE_QP | ENCA_SURFACE_REMOVE) } EncaSurface; |
Surface flags.
ENCA_SURFACE_EOL_CR | End-of-lines are represented with CR's. |
ENCA_SURFACE_EOL_LF | End-of-lines are represented with LF's. |
ENCA_SURFACE_EOL_CRLF | End-of-lines are represented with CRLF's. |
ENCA_SURFACE_EOL_MIX | Several end-of-line types, mixed. |
ENCA_SURFACE_EOL_BIN | End-of-line concept not applicable (binary data). |
ENCA_SURFACE_MASK_EOL | Mask for end-of-line surfaces. |
ENCA_SURFACE_PERM_21 | Odd and even bytes swapped. |
ENCA_SURFACE_PERM_4321 | Reversed byte sequence in 4byte words. |
ENCA_SURFACE_PERM_MIX | Chunks with both endianess, concatenated. |
ENCA_SURFACE_MASK_PERM | Mask for permutation surfaces. |
ENCA_SURFACE_QP | Quoted printables. |
ENCA_SURFACE_REMOVE | Recode `remove' surface. |
ENCA_SURFACE_UNKNOWN | Unknown surface. |
ENCA_SURFACE_MASK_ALL | Mask for all bits, withnout ENCA_SURFACE_UNKNOWN. |
typedef enum { /*< flags >*/ ENCA_CHARSET_7BIT = 1 << 0, ENCA_CHARSET_8BIT = 1 << 1, ENCA_CHARSET_16BIT = 1 << 2, ENCA_CHARSET_32BIT = 1 << 3, ENCA_CHARSET_FIXED = 1 << 4, ENCA_CHARSET_VARIABLE = 1 << 5, ENCA_CHARSET_BINARY = 1 << 6, ENCA_CHARSET_REGULAR = 1 << 7, ENCA_CHARSET_MULTIBYTE = 1 << 8 } EncaCharsetFlags; |
Charset properties.
Flags ENCA_CHARSET_7BIT, ENCA_CHARSET_8BIT, ENCA_CHARSET_16BIT, ENCA_CHARSET_32BIT tell how many bits a `fundamental piece' consists of. This is different from bits per character; r.g. UTF-8 consists of 8bit pieces (bytes), but character can be composed from 1 to 6 of them.
ENCA_CHARSET_7BIT | Characters are represented with 7bit characters. |
ENCA_CHARSET_8BIT | Characters are represented with bytes. |
ENCA_CHARSET_16BIT | Characters are represented with 2byte words. |
ENCA_CHARSET_32BIT | Characters are represented with 4byte words. |
ENCA_CHARSET_FIXED | One characters consists of one fundamental piece. |
ENCA_CHARSET_VARIABLE | One character consists of variable number of fundamental pieces. |
ENCA_CHARSET_BINARY | Charset is binary from ASCII viewpoint. |
ENCA_CHARSET_REGULAR | Language dependent (8bit) charset. |
ENCA_CHARSET_MULTIBYTE | Multibyte charset. |
typedef enum { ENCA_NAME_STYLE_ENCA, ENCA_NAME_STYLE_RFC1345, ENCA_NAME_STYLE_CSTOCS, ENCA_NAME_STYLE_ICONV, ENCA_NAME_STYLE_HUMAN } EncaNameStyle; |
Charset naming styles and conventions.
typedef enum { ENCA_EOK = 0, ENCA_EINVALUE, ENCA_EEMPTY, ENCA_EFILTERED, ENCA_ENOCS8, ENCA_ESIGNIF, ENCA_EWINNER, ENCA_EGARBAGE } EncaErrno; |
Error codes.
ENCA_EOK | OK. |
ENCA_EINVALUE | Invalid value (usually of an option). |
ENCA_EEMPTY | Sample is empty. |
ENCA_EFILTERED | After filtering, (almost) nothing remained. |
ENCA_ENOCS8 | Mulitibyte tests failed and language contains no 8bit charsets. |
ENCA_ESIGNIF | Too few significant characters. |
ENCA_EWINNER | No clear winner. |
ENCA_EGARBAGE | Sample is garbage. |