OpenOffice Hyphenation算法 - 参数是什么意思?

时间:2010-11-11 22:29:00

标签: c openoffice.org hyphenation

我正在寻找从OpenOffice网站下载的连字符算法,但我无法理解参数 rep pos cut 是在看评论之后。有知识的人可以告诉我这些参数有什么作用吗?以下是评论。

从示例中可以看出, ff 可以替换为单个 f ,但这与连字符有何关系?

谢谢,


/*

int hnj_hyphen_hyphenate2(): non-standard hyphenation.

(It supports Catalan, Dutch, German, Hungarian, Norwegian, Swedish etc. orthography, see documentation.)

input data: word: input word word_size: byte length of the input word

hyphens: allocated character buffer (size = word_size + 5) hyphenated_word: allocated character buffer (size ~ word_size * 2) or NULL rep, pos, cut: pointers (point to the allocated and zeroed buffers (size=word_size) or with NULL value) or NULL

output data: hyphens: hyphenation vector (hyphenation points signed with odd numbers) hyphenated_word: hyphenated input word (hyphens signed with ='), optional (NULL input) rep: NULL (only standard hyph.), or replacements (hyphenation points signed with=' in replacements); pos: NULL, or difference of the actual position and the beginning positions of the change in input words; cut: NULL, or counts of the removed characters of the original words at hyphenation,

Note: rep, pos, cut are complementary arrays to the hyphens, indexed with the character positions of the input word.

For example: Schiffahrt -> Schiff=fahrt, pattern: f1f/ff=f,1,2 output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2

Note: hnj_hyphen_hyphenate2() can allocate rep, pos, cut (word_size length arrays):

char ** rep = NULL; int * pos = NULL; int * cut = NULL; char hyphens[MAXWORDLEN]; hnj_hyphen_hyphenate2(dict, "example", 7, hyphens, NULL, &rep, &pos, &cut);

See example in the source distribution.

*/

int hnj_hyphen_hyphenate2 (HyphenDict *dict, const char *word, int word_size, char * hyphens, char *hyphenated_word, char * rep, int ** pos, int ** cut);

1 个答案:

答案 0 :(得分:3)

我相信你指的是以下评论:

// For example:
//  Schiffahrt -> Schiff=fahrt,
//  pattern: f1f/ff=f,1,2
//  output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2

这个例子提到了德国连字规则,因为它们是在1990年代的拼写改革之前。德语中的复合名词写成一个单词,根据旧规则,如果元音跟随,则省略第三个辅音,例如'Schifffahrt'('Schiff'和'Fahrt'的单词'中的'f'。 ('Schifffahrt'被写为'Schiffahrt'),但是在连字符时仍然写有遗漏的字母。

因此,示例的含义并不是'ff'可以替换为单个'f',而是'ff'可以替换为'ff-f'。

因此参数的含义是:

  • rep:包含替换'ff-f'代替'ff'
  • pos:值为1意味着替换在连字符位置为5之前开始一个字母
  • cut:值为2表示需要从输入字中删除2个字符。

这些参数似乎只用于罕见的情况,即连字时单词的拼写方式不同。