我们正在使用Sphinx和MYSQL。所以我们的MYSQL是utf,并且有汉字,我们需要Sphinx来支持CJK。这是我们在sphinx.conf中的内容:
charset_type = utf-8
charset_table = 0..9, U+27, U+41..U+5a->U+61..U+7a, U+61..U+7a, \
U+aa, U+b5, U+ba, \
U+c0..U+d6->U+e0..U+f6, U+d8..U+de->U+f8..U+fe, U+df..U+f6, \
U+f8..U+ff, U+100..U+12f/2, U+130->U+69, \
U+131, U+132..U+137/2, U+138, \
...
...
...
ngram_chars = U+3400..U+4DB5, U+4E00..U+9FA5, U+20000..U+2A6D6,U+4E00..U+9FBB, U+3400..U+4DB5, U+20000..U+2A6D6, U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27, U+FA28, U+FA29, U+3105..U+312C, U+31A0..U+31B7, U+3041, \
U+3043, U+3045, U+3047, U+3049, U+304B, U+304D, U+304F, U+3051, U+3053, U+3055, U+3057, U+3059, U+305B, U+305D, U+305F, U+3061, U+3063, U+3066, U+3068, U+306A..U+306F, U+3072, U+3075, U+3078, U+307B, U+307E..U+3083, U+3085, U+3087, U+3089..U+308E, U+3090..U+3093, \
U+30A1, U+30A3, U+30A5, U+30A7, U+30A9, U+30AD, U+30AF, U+30B3, U+30B5, U+30BB, U+30BD, U+30BF, U+30C1, U+30C3, U+30C4, U+30C6, U+30CA, U+30CB, U+30CD, U+30CE, U+30DE, U+30DF, U+30E1, U+30E2, U+30E3, U+30E5, U+30E7, U+30EE, U+30F0..U+30F3, U+30F5, U+30F6, U+31F0, \
U+31F1, U+31F2, U+31F3, U+31F4, U+31F5, U+31F6, U+31F7, U+31F8, U+31F9, U+31FA, U+31FB, U+31FC, U+31FD, U+31FE, U+31FF, U+AC00..U+D7A3, U+1100..U+1159, U+1161..U+11A2, U+11A8..U+11F9, U+A000..U+A48C, U+A492..U+A4C6
ngram_len = 1
和mysql conf:
character_set_client:utf8
character_set_connection:utf8
character_set_database:utf8 character_set_results:utf8 character_set_server:utf8 character_set_system:utf8 collation_connection:utf8_general_ci collation_database:utf8_general_ci collation_server:utf8_general_ci init_connect:SET NAMES utf8
它设法将奇怪的字符索引为中文:今å®μç|»å«åŽä½•æ-¥å>å†æ¥ 真正的中国人就像这样出现了???在sphinx:后来
我相信有一些编码问题,但我不知道在哪里。