Question

我有一个基于django的应用程序，有干草堆和嗖嗖的搜索引擎。我想提供一个重音和特殊字符独立搜索，以便我可以通过使用没有特殊字符的单词找到带有特殊字符的索引数据：

索引是：

'café'

搜索字词：

'cafe'  
'café'

我已经写了一个特定的FoldingWhooshSearchBackend，它使用了StemmingAnalyzer和CharsetFilter(accent_map)，如下文所述：

https://gist.github.com/gregplaysguitar/1727204

然而搜索仍然没有像预期的那样工作，即我无法搜索'cafe'并找到'café'。我使用以下方法查看了搜索索引：

from whoosh.index import open_dir
ix = open_dir('myservice/settings/whoosh_index')
searcher = ix.searcher()
for doc in searcher.documents():
    print doc

特殊字符仍在索引中。

我还需要做些什么吗？是关于改变索引模板吗？

Answer 1

您必须为模型编写Haystack SearchIndex个类。这就是如何为搜索索引准备模型数据。

myapp / search_index.py示例：

from haystack import site
from haystack import indexes

class UserProfileIndex(indexes.SearchIndex):
    text = indexes.CharField(document=True)

    def prepare_text(self, obj):
        data = [obj.get_full_name(), obj.user.email, obj.phone]
        original = ' '.join(data)
        slugified = slugify(original)
        return ' '.join([original, slugified])

site.register(UserProfile, UserProfileIndex)

如果用户的名称为café，您会在其搜索字词为café和cafe的情况下找到他的个人资料。

Answer 2

我认为最好的方法是让Haystack创建架构以实现最大的向前兼容性，然后攻击CharsetFilter。

此代码适用于Haystack 2.4.0和Whoosh 2.7.0：

from haystack.backends.whoosh_backend import WhooshEngine, WhooshSearchBackend
from whoosh.analysis import CharsetFilter, StemmingAnalyzer
from whoosh.support.charset import accent_map
from whoosh.fields import TEXT


class FoldingWhooshSearchBackend(WhooshSearchBackend):

    def build_schema(self, fields):
        schema = super(FoldingWhooshSearchBackend, self).build_schema(fields)

        for name, field in schema[1].items():
            if isinstance(field, TEXT):
                field.analyzer = StemmingAnalyzer() | CharsetFilter(accent_map)

        return schema


class FoldingWhooshEngine(WhooshEngine):
    backend = FoldingWhooshSearchBackend

Django干草堆的字符折叠和飞快移动

2 个答案: