我正在寻找从OpenOffice网站下载的连字符算法,但我无法理解参数 rep , pos 和 cut 是在看评论之后。有知识的人可以告诉我这些参数有什么作用吗?以下是评论。
从示例中可以看出, ff 可以替换为单个 f ,但这与连字符有何关系?
谢谢,
/*
int hnj_hyphen_hyphenate2(): non-standard hyphenation.
(It supports Catalan, Dutch, German, Hungarian, Norwegian, Swedish
etc. orthography, see documentation.)
input data:
word: input word
word_size: byte length of the input word
hyphens: allocated character buffer (size = word_size + 5)
hyphenated_word: allocated character buffer (size ~ word_size * 2) or NULL
rep, pos, cut: pointers (point to the allocated and zeroed buffers
(size=word_size) or with NULL value) or NULL
output data:
hyphens: hyphenation vector (hyphenation points signed with odd numbers)
hyphenated_word: hyphenated input word (hyphens signed with ='),
optional (NULL input)
rep: NULL (only standard hyph.), or replacements (hyphenation points
signed with
=' in replacements);
pos: NULL, or difference of the actual position and the beginning
positions of the change in input words;
cut: NULL, or counts of the removed characters of the original words
at hyphenation,
Note: rep, pos, cut are complementary arrays to the hyphens, indexed with the
character positions of the input word.
For example:
Schiffahrt -> Schiff=fahrt,
pattern: f1f/ff=f,1,2
output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2
Note: hnj_hyphen_hyphenate2() can allocate rep, pos, cut (word_size
length arrays):
char ** rep = NULL;
int * pos = NULL;
int * cut = NULL;
char hyphens[MAXWORDLEN];
hnj_hyphen_hyphenate2(dict, "example", 7, hyphens, NULL, &rep, &pos, &cut);
See example in the source distribution.
*/
int hnj_hyphen_hyphenate2 (HyphenDict *dict,
const char *word, int word_size, char * hyphens,
char *hyphenated_word, char * rep, int ** pos, int ** cut);
答案 0 :(得分:3)
我相信你指的是以下评论:
// For example: // Schiffahrt -> Schiff=fahrt, // pattern: f1f/ff=f,1,2 // output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2
这个例子提到了德国连字规则,因为它们是在1990年代的拼写改革之前。德语中的复合名词写成一个单词,根据旧规则,如果元音跟随,则省略第三个辅音,例如'Schifffahrt'('Schiff'和'Fahrt'的单词'中的'f'。 ('Schifffahrt'被写为'Schiffahrt'),但是在连字符时仍然写有遗漏的字母。
因此,示例的含义并不是'ff'可以替换为单个'f',而是'ff'可以替换为'ff-f'。
因此参数的含义是:
rep
:包含替换'ff-f'代替'ff'pos
:值为1意味着替换在连字符位置为5之前开始一个字母cut
:值为2表示需要从输入字中删除2个字符。这些参数似乎只用于罕见的情况,即连字时单词的拼写方式不同。