我正在使用elasticsearch和haystack来提供搜索。我希望用户使用英语以外的语言进行搜索。例如。目前正在尝试希腊语。
在搜索任何内容时如何忽略重音符号。例如。假设我输入Ανδρέας(带重音符号),其返回的结果与之匹配。
但是当我输入Ανδρεας时,它没有返回任何结果。搜索引擎应该带来任何带有“Ανδρέας”但也有“Ανδρεας”的结果(第二个没有重音)。
有人可以指出如何解决问题吗?
如果我需要弹性搜索,search_indexex等的发布设置,请告诉我。
编辑:
这是我的索引设置:
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"myanalyzer_search": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"greek_lowercase_filter",
"my_stop_filter",
"greek_stem_filter",
"english_stem_filter",
"my_edge_ngram_filter",
"asciifolding"
]
},
"myanalyzer_index": {
"type": "custom",
"tokenizer": "edgeNGram",
"filter": [
"greek_lowercase_filter",
"my_stop_filter",
"greek_stem_filter",
"english_stem_filter",
"my_edge_ngram_filter",
"asciifolding"
]
},
},
"tokenizer": {
"my_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "18",
"token_chars": ["letter"]
}
},
"filter": {
"my_edge_ngram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 18
},
"greek_stem_filter": {
"type": "stemmer",
"name": "greek"
},
"greek_lowercase_filter": {
"type": "lowercase",
"language": "greek"
},
"english_stem_filter": {
"type": "stemmer",
"name": "english"
},
"my_stop_filter": {
"type": "stop",
"stopwords": ["_greek_", "_english_"]
}
}
}
}
}
这存在于search_index.py
:
class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
sorted_title = indexes.CharField(model_attr='title', indexed=False, stored=True)
employment_history = indexes.EdgeNgramField(model_attr='employment_history', null=True)
def get_model(self):
return SellerProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
.........
这是模板:
{{ object.user.get_full_name }}
{{ object.title }}
{{ object.bio }}
{{ object.employment_history }}
{{ object.education }}
我正在进行如下查询:
results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρεας')
和
results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρέας')
感谢。
答案 0 :(得分:2)
您需要将asciifolding
令牌过滤器添加到分析/查询管道http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html
这基本上会删除你单词中的任何重音,这样你以后可以轻松地找到它们/不用重音搜索。