sphinx CJK支持

时间:2012-03-01 03:28:23

标签: sphinx

我们正在使用Sphinx和MYSQL。所以我们的MYSQL是utf,并且有汉字,我们需要Sphinx来支持CJK。这是我们在sphinx.conf中的内容:

charset_type = utf-8
charset_table                   = 0..9, U+27, U+41..U+5a->U+61..U+7a,  U+61..U+7a, \
U+aa, U+b5, U+ba, \
U+c0..U+d6->U+e0..U+f6,  U+d8..U+de->U+f8..U+fe,  U+df..U+f6, \
U+f8..U+ff,  U+100..U+12f/2,  U+130->U+69, \
U+131,  U+132..U+137/2,  U+138, \
...
...
...
ngram_chars                     = U+3400..U+4DB5, U+4E00..U+9FA5, U+20000..U+2A6D6,U+4E00..U+9FBB, U+3400..U+4DB5, U+20000..U+2A6D6, U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27, U+FA28, U+FA29, U+3105..U+312C, U+31A0..U+31B7, U+3041, \
U+3043, U+3045, U+3047, U+3049, U+304B, U+304D, U+304F, U+3051, U+3053, U+3055, U+3057, U+3059, U+305B, U+305D, U+305F, U+3061, U+3063, U+3066, U+3068, U+306A..U+306F, U+3072, U+3075, U+3078, U+307B, U+307E..U+3083, U+3085, U+3087, U+3089..U+308E, U+3090..U+3093, \
U+30A1, U+30A3, U+30A5, U+30A7, U+30A9, U+30AD, U+30AF, U+30B3, U+30B5, U+30BB, U+30BD, U+30BF, U+30C1, U+30C3, U+30C4, U+30C6, U+30CA, U+30CB, U+30CD, U+30CE, U+30DE, U+30DF, U+30E1, U+30E2, U+30E3, U+30E5, U+30E7, U+30EE, U+30F0..U+30F3, U+30F5, U+30F6, U+31F0, \
U+31F1, U+31F2, U+31F3, U+31F4, U+31F5, U+31F6, U+31F7, U+31F8, U+31F9, U+31FA, U+31FB, U+31FC, U+31FD, U+31FE, U+31FF, U+AC00..U+D7A3, U+1100..U+1159, U+1161..U+11A2, U+11A8..U+11F9, U+A000..U+A48C, U+A492..U+A4C6
ngram_len                               = 1

和mysql conf:

character_set_client:utf8                                                                 
character_set_connection:utf8                                                             
character_set_database:utf8                                                               character_set_results:utf8                                                                character_set_server:utf8                                                                 character_set_system:utf8                                                                 collation_connection:utf8_general_ci                                                      collation_database:utf8_general_ci                                                        collation_server:utf8_general_ci                                                          init_connect:SET NAMES utf8

它设法将奇怪的字符索引为中文:今å®μç|»å«åŽä½•æ-¥å>å†æ¥ 真正的中国人就像这样出现了???在sphinx:后来

我相信有一些编码问题,但我不知道在哪里。

0 个答案:

没有答案