Question

我正在使用Whoosh来实现一个小型的本地搜索引擎。文件包含法语和英语。

如您所知，重音符（à è é ...）经常用于法语。所以我必须按照Whoosh Documentation：

的建议使用重音折叠来处理它们

accent_analyzer = RegexAnalyzer(r'\w+') | LowercaseFilter() \
                  | StopFilter() | CharsetFilter(accent_map)

schema = Schema(path=ID(stored=True), content=TEXT(analyzer=accent_analyzer))

索引文档工作得很好（没有错误）。

但是当涉及到搜索时，我对包含重音的单词没有任何结果。

例如

将文件D与content = u'unité logique'：

联系起来

使用logique搜索文档。
使用unité搜索没有。
使用unite搜索没有。

所以我认为索引编写者忽略带有重音的单词，这就是为什么它不会显示查询这些单词的结果，无论查询是否包含重音。

只是提醒一下，我想要达到的目标是使用D和unité这两个字来点击文档unite。

Answer 1

whoosh要求所有字符串都是unicode

http://metacpan.org/pod/Socket

用于unicode中的重音请参阅Does whoosh require all strings to be unicode?

（http://unicodelookup.com/）

飞快移动

1 个答案: