unicode_line_break, unicode_lb_init, unicode_lb_set_opts, unicode_lb_next, unicode_lb_next_cnt, unicode_lb_end, unicode_lbc_init, unicode_lbc_set_opts, unicode_lbc_next, unicode_lbc_next_cnt, unicode_lbc_end — calculate mandatory or allowed line breaks
#include <courier-unicode.h>
unicode_lb_info_t
unicode_lb_init( |
int (*cb_func)(int, void *), |
void *cb_arg) ; |
void
unicode_lb_set_opts( |
unicode_lb_info_t lb, |
int opts) ; |
int
unicode_lb_next( |
unicode_lb_info_t lb, |
char32_t c) ; |
int
unicode_lb_next_cnt( |
unicode_lb_info_t lb, |
const char32_t *cptr, | |
size_t cnt) ; |
int
unicode_lb_end( |
unicode_lb_info_t lb) ; |
unicode_lbc_info_t
unicode_lbc_init( |
int (*cb_func)(int, char32_t, void *), |
void *cb_arg) ; |
void
unicode_lbc_set_opts( |
unicode_lbc_info_t lb, |
int opts) ; |
int
unicode_lbc_next( |
unicode_lb_info_t lb, |
char32_t c) ; |
int
unicode_lbc_next_cnt( |
unicode_lb_info_t lb, |
const char32_t *cptr, | |
size_t cnt) ; |
int
unicode_lbc_end( |
unicode_lb_info_t lb) ; |
These functions implement the unicode line breaking
algorithm. Invoke unicode_lb_init
() to initialize the line
breaking algorithm. The first parameter is a callback
function. The second parameter is an opaque pointer. The
callback function gets invoked with two parameters. The first
parameter is one of three values: UNICODE_LB_MANDATORY
, UNICODE_LB_NONE
, or UNICODE_LB_ALLOWED
, as described below. The
second parameter is the opaque pointer that was passed to
unicode_lb_init
(); the opaque
pointer is not subject to any further interpretation by these
functions.
unicode_lb_init
() returns an
opaque handle. Repeated invocations of unicode_lb_next
(), passing the handle and
one unicode character at a time, defines a sequence of
unicode characters over which the line breaking algorithm
calculation takes place. unicode_lb_next_cnt
() is a shortcut for
invoking unicode_lb_next
()
repeatedly over an array cptr
containing cnt
unicode
characters.
unicode_lb_end
() denotes the
end of the unicode character sequence. After the call to
unicode_lb_end
() the line
breaking unicode_lb_info_t
handle is no longer valid.
Between the call to unicode_lb_init
() and unicode_lb_end
(), the callback function
gets invoked exactly once for each unicode character given to
unicode_lb_next
() or
unicode_lb_next_cnt
(). Usually
each call to unicode_lb_next
()
results in the callback function getting invoked immediately,
but it does not have to be. It's possible that a call to
unicode_lb_next
() returns
without invoking the callback function, and some subsequent
call to unicode_lb_next
() (or
unicode_lb_end
()) invokes the
callback function more than once, to catch up. The contract
is that before unicode_lb_end
()
returns, the callback function gets invoked the exact number
of times as the number of characters in the unicode sequence
defined by the intervening calls to unicode_lb_next
() and unicode_lb_next_cnt
(), unless an error
occurs.
Each call to the callback function reports the calculated line breaking status of the corresponding character in the unicode character sequence:
UNICODE_LB_MANDATORY
A line break is MANDATORY before the corresponding character.
UNICODE_LB_NONE
A line break is PROHIBITED before the corresponding character.
UNICODE_LB_ALLOWED
A line break is OPTIONAL before the corresponding character.
The callback function should return 0. A non-zero value
indicates to the line breaking algorithm that an error has
occurred. unicode_lb_next
() and
unicode_lb_next_cnt
() return
zero either if they never invoked the callback function, or
if each call to the callback function returned zero. A non
zero return from the callback function results in
unicode_lb_next
() and
unicode_lb_next_cnt
()
immediately returning the same value.
unicode_lb_end
() must be
invoked to destroy the line breaking handle even if
unicode_lb_next
() and
unicode_lb_next_cnt
() returned
an error indication. It's also possible that, under normal
circumstances, unicode_lb_end
()
invokes the callback function one or more times. The return
value from unicode_lb_end
() has
the same meaning as from unicode_lb_next
() and unicode_lb_next_cnt
(); however in all cases
after unicode_lb_end
() returns
the line breaking handle is no longer valid.
unicode_lbc_init
(),
unicode_lbc_next
(),
unicode_lbc_next_cnt
(),
unicode_lbc_end
() are
alternative functions that implement the same algorithm.
The only difference is that the callback function receives
an extra parameter, the unicode character value to which
the line breaking status applies to, passed through from
the input unicode character sequence.
unicode_lb_set_opts
() and
unicode_lbc_set_opts
() enable
non-default options for the line breaking algorithm. These
functions must be called immediately after unicode_lb_init
() or unicode_lbc_init
(), and before any other
function. opts
is a
bitmask that can contain the following values:
UNICODE_LB_OPT_PRBREAK
Enables a modified LB24
rule. This prevents plus signs,
as in “C++” from breaking. This flag
adds the following rules to the LB24 rule:
PR x PR AL x PR ID x PR
UNICODE_LB_OPT_SYBREAK
Tailored breaking rules for the “/”
character. This prevents breaking after the
“/” character (think URLs);
including an exception to the “x SY” rule
in LB13
. This flag adds
the following rules to the LB24 rule:
SY x EX SY x AL SY x ID SP ÷ SY, which takes precedence over "x SY".
UNICODE_LB_OPT_DASHWJ
This flag reclassifies U+2013
and U+2014
as class WJ
, prohibiting breaks before and
after the m-dash and the n-dash unicode
characters.