unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init, unicode_convert, unicode_convert_deinit, unicode_convert_tocbuf_init, unicode_convert_tou_init, unicode_convert_fromu_init, unicode_convert_uc, unicode_convert_tocbuf_toutf8_init, unicode_convert_tocbuf_fromutf8_init, unicode_convert_toutf8, unicode_convert_fromutf8, unicode_convert_tobuf, unicode_convert_tou_tobuf, unicode_convert_fromu_tobuf — unicode character set conversion
#include <courier-unicode.h> extern const char unicode_u_ucs4_native[]; extern const char unicode_u_ucs2_native[];
unicode_convert_handle_t
unicode_convert_init( |
const char *src_chset, |
const char *dst_chset, | |
void *cb_arg) ; |
int
unicode_convert( |
unicode_convert_handle_t handle, |
const char *text, | |
size_t cnt) ; |
int
unicode_convert_deinit( |
unicode_convert_handle_t handle, |
int *errptr) ; |
unicode_convert_handle_t
unicode_convert_tocbuf_init( |
const char *src_chset, |
const char *dst_chset, | |
char **cbufptr_ret, | |
size_t *cbufsize_ret, | |
int nullterminate) ; |
unicode_convert_handle_t
unicode_convert_tocbuf_toutf8_init( |
const char *src_chset, |
char **cbufptr_ret, | |
size_t *cbufsize_ret, | |
int nullterminate) ; |
unicode_convert_handle_t
unicode_convert_tocbuf_fromutf8_init( |
const char *dst_chset, |
char **cbufptr_ret, | |
size_t *cbufsize_ret, | |
int nullterminate) ; |
unicode_convert_handle_t
unicode_convert_tou_init( |
const char *src_chset, |
char32_t **ucptr_ret, | |
size_t *ucsize_ret, | |
int nullterminate) ; |
unicode_convert_handle_t
unicode_convert_fromu_init( |
const char *dst_chset, |
char **cbufptr_ret, | |
size_t *cbufsize_ret, | |
int nullterminate) ; |
int
unicode_convert_uc( |
unicode_convert_handle_t handle, |
const char32_t *text, | |
size_t cnt) ; |
char
*unicode_convert_toutf8( |
const char *text, |
const char *charset, | |
int *error) ; |
char
*unicode_convert_fromutf8( |
const char *text, |
const char *charset, | |
int *error) ; |
char
*unicode_convert_tobuf( |
const char *text, |
const char *charset, | |
const char *dstcharset, | |
int *error) ; |
int
unicode_convert_toubuf( |
const char *text, |
size_t text_l, | |
const char *charset, | |
char32_t **uc, | |
size_t *ucsize, | |
int *error) ; |
int
unicode_convert_fromu_tobuf( |
const char32_t *utext, |
size_t utext_l, | |
const char *charset, | |
char **c, | |
size_t *csize, | |
int *error) ; |
unicode_u_ucs4_native
[]
contains the string “UCS-4BE” or “UCS-4LE”, matching
the native char32_t
endianness.
unicode_u_ucs2_native
[]
contains the string “UCS-2BE” or “UCS-2LE”, matching
the native char32_t
endianness.
unicode_convert_init
(),
unicode_convert
(), and
unicode_convert_deinit
() are an
adaption of th iconv(3) API that uses the same
calling convention as the other algorithms in this unicode
library, with some value-added features. These functions use
iconv(3) to effect the actual
character set conversion.
unicode_convert_init
()
returns a non-NULL handle for the requested conversion, or
NULL if the requested conversion is not available.
unicode_convert_init
() takes a
pointer to the output function that receives receives
converted character text. The output function receives a
pointer to the converted character text, and the number of
characters in the converted text. The output function gets
repeatedly called, until it receives the entire converted
text.
The character text to convert gets passed, repeatedly, to
unicode_convert
(). Each call to
unicode_convert
() results in
the output function getting invoked, zero or more times, with
each successive part of the converted text. Finally,
unicode_convert_deinit
() stops
the conversion and deallocates the conversion handle.
It's possible that a call to unicode_convert_deinit
() results in some
additional calls to the output function, passing the
remaining, final parts, of the converted text, before
unicode_convert_deinit
()
deallocates the handle, and returns.
The output function should return 0 normally. A non-0
return indicates n error condition. unicode_convert_deinit
() returns non-zero
if any previous invocation of the output function returned
non-zero (this includes any invocations of the output
function resulting from this call, or prior unicode_convert
() calls), or 0 if all
invocations of the output function returned 0.
If the errptr
is
not NULL
, *errptr
gets set to non-zero if
there were any conversion errors -- if there was any text
that could not be converted to the destination character
text.
unicode_convert
() also
returns non-zero if it calls the output function and it
returns non-zero, however the conversion handle remains
allocated, so unicode_convert_deinit
() must still be
called, to clean that up.
Call unicode_convert_tocbuf_init
() instead of
unicode_convert_init
(), then
call unicode_convert
() and
unicode_convert_deinit
()
normally. The parameters to unicode_convert_init
() specify the source
and the destination character sets. unicode_convert_tocbuf_toutf8_init
() is
just an alias that specifies UTF-8
as the destination character set.
unicode_convert_tocbuf_fromutf8_init
() is
just an alias that specifies UTF-8
as the source character st.
These functions supply an output function that collects
the converted text into a malloc()ed buffer. If
unicode_convert_deinit
()
returns 0, *cbufptr_ret
gets initialized
to a malloc()ed buffer, and the number of converted
characters, the size of the malloc()ed buffer, get placed
into *cbufsize_ret
.
If the converted string is an empty string,
*cbufsize_ret
gets set to 0, but *cbufptr_ret
still gets
initialized (to a dummy malloced buffer).
A non-zero nullterminate
places a
trailing \0 character after the converted string (this is
included in *cbufsize_ret
).
unicode_convert_tou_init
()
converts character text into a char32_t
buffer. It works just like
unicode_convert_tocbuf_init
(), except
that only the source character set gets specified and the
output buffer is a char32_t
buffer. nullterminate
terminates the
converted unicode characters with a U+0000
.
unicode_convert_fromu_init
() converts
char32_t
s to the output
character set, and also works like unicode_convert_tocbuf_init
().
Additionally, in this case, unicode_convert_uc
() works just like
unicode_convert
() except that
the input sequence is a char32_t
sequence, and the count
parameter is th enumber of unicode characters.
unicode_convert_toutf8
()
converts the specified text in the specified text into a
UTF-8 string, returning a malloced buffer. If error
is not NULL
, even if unicode_convert_toutf8
() returns a non
NULL
value *error
gets set to a non-zero
value if a character conversion error has occurred, and
some characters could not be converted.
unicode_convert_fromutf8
()
does a similar conversion from UTF-8 text
to the specified
character set.
unicode_convert_tobuf
()
does a similar conversion between two different character
sets.
unicode_convert_tou_tobuf
() calls
unicode_convert_tou_init
(),
feeds the character string through unicode_convert
(), then calls
unicode_convert_deinit
(). If
this function returns 0, *uc
and *ucsize
are set to a malloced
buffer+size holding the unicode char array.
unicode_convert_fromu_tobuf
() calls
unicode_convert_fromu_init
(),
feeds the unicode array through unicode_convert_uc
(), then calls
unicode_convert_deinit(). If this function returns 0,
*c
and *csize
are set to a malloced
buffer+size holding the char array.