我目前正在使用弹性搜索后端运行haystack,现在我正在为城市名称构建自动完成功能。问题是SearchQuerySet给了我不同的结果,从我的角度来看是错误的,而不是直接在elasticsearch中执行的相同查询,这对我来说是预期的结果。
我正在使用:Django 1.5.4, django-haystack 2.1.0, pyelasticsearch 0.6.1, elasticsearch 0.90.3
使用以下示例数据:
使用
SearchQuerySet().models(Geoname).filter(name_auto='mid')
or
SearchQuerySet().models(Geoname).autocomplete(name_auto='mid')
结果总是返回所有6个名字,包括Min *和Mia * ......但是,查询elasticsearch会直接返回正确的数据:
"query": {
"filtered" : {
"query" : {
"match_all": {}
},
"filter" : {
"term": {"name_auto": "mid"}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "haystack",
"_type": "modelresult",
"_id": "csi.geoname.4075977",
"_score": 1,
"_source": {
"name_auto": "Midfield",
}
},
{
"_index": "haystack",
"_type": "modelresult",
"_id": "csi.geoname.4075984",
"_score": 1,
"_source": {
"name_auto": "Midland City",
}
},
{
"_index": "haystack",
"_type": "modelresult",
"_id": "csi.geoname.4075989",
"_score": 1,
"_source": {
"name_auto": "Midway",
}
}
]
}
}
不同的例子的行为是一样的。我的猜测是,通过所有可能的“min_gram”字符组分割和分析字符串干草堆,这就是它返回错误结果的原因。
我不确定我是在做什么还是在理解错误的东西,如果这是干草堆应该如何工作,但我需要干草堆结果与弹性搜索结果相匹配。
那么,我该如何解决问题或使其有效?
我的总结对象如下:
型号:
class Geoname(models.Model):
id = models.IntegerField(primary_key=True)
name = models.CharField(max_length=255)
指数:
class GeonameIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name_auto = indexes.EdgeNgramField(model_attr='name')
def get_model(self):
return Geoname
映射:
modelresult: {
_boost: {
name: "boost",
null_value: 1
},
properties: {
django_ct: {
type: "string"
},
django_id: {
type: "string"
},
name_auto: {
type: "string",
store: true,
term_vector: "with_positions_offsets",
analyzer: "edgengram_analyzer"
}
}
}
谢谢。
答案 0 :(得分:11)
深入研究代码后,我发现haystack生成的搜索是:
{
"query":{
"filtered":{
"filter":{
"fquery":{
"query":{
"query_string":{
"query": "django_ct:(csi.geoname)"
}
},
"_cache":false
}
},
"query":{
"query_string":{
"query": "name_auto:(mid)",
"default_operator":"or",
"default_field":"text",
"auto_generate_phrase_queries":true,
"analyze_wildcard":true
}
}
}
},
"from":0,
"size":6
}
在elasticsearch中运行此查询,结果显示了haystack显示的相同6个对象...但是如果我添加到“query_string”
"analyzer": "standard"
它按预期工作。因此,我们的想法是能够为该领域设置不同的搜索分析器。
根据@ user954994答案的链接以及this post的说明,我最终做的工作是:
所以,我的新设置是:
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
},
"suggest_analyzer": {
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"asciifolding"
]
},
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
}
我的新自定义build_schema方法如下所示:
def build_schema(self, fields):
content_field_name, mapping = super(ConfigurableElasticBackend,
self).build_schema(fields)
for field_name, field_class in fields.items():
field_mapping = mapping[field_class.index_fieldname]
index_analyzer = getattr(field_class, 'index_analyzer', None)
search_analyzer = getattr(field_class, 'search_analyzer', None)
field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER)
if field_mapping['type'] == 'string' and field_class.indexed:
if not hasattr(field_class, 'facet_for') and not field_class.field_type in('ngram', 'edge_ngram'):
field_mapping['analyzer'] = field_analyzer
if index_analyzer and search_analyzer:
field_mapping['index_analyzer'] = index_analyzer
field_mapping['search_analyzer'] = search_analyzer
del(field_mapping['analyzer'])
mapping.update({field_class.index_fieldname: field_mapping})
return (content_field_name, mapping)
在重建索引之后,我的映射如下所示:
modelresult: {
_boost: {
name: "boost",
null_value: 1
},
properties: {
django_ct: {
type: "string"
},
django_id: {
type: "string"
},
name_auto: {
type: "string",
store: true,
term_vector: "with_positions_offsets",
index_analyzer: "edgengram_analyzer",
search_analyzer: "suggest_analyzer"
}
}
}
现在一切都按预期工作了!
<强>更新强>
Bellow你会找到澄清这一部分的代码:
- 我创建了自定义elasticsearch后端,添加了一个基于标准分析器的新自定义分析器。
- 我添加了一个自定义EdgeNgramField,启用了为索引(index_analyzer)设置特定分析器的方法和另一个分析器 搜索(search_analyzer)。
醇>
进入我的app search_backends.py:
from django.conf import settings
from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend
from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine
from haystack.fields import EdgeNgramField as BaseEdgeNgramField
# Custom Backend
class CustomElasticBackend(ElasticsearchSearchBackend):
DEFAULT_ANALYZER = None
def __init__(self, connection_alias, **connection_options):
super(CustomElasticBackend, self).__init__(
connection_alias, **connection_options)
user_settings = getattr(settings, 'ELASTICSEARCH_INDEX_SETTINGS', None)
self.DEFAULT_ANALYZER = getattr(settings, 'ELASTICSEARCH_DEFAULT_ANALYZER', "snowball")
if user_settings:
setattr(self, 'DEFAULT_SETTINGS', user_settings)
def build_schema(self, fields):
content_field_name, mapping = super(CustomElasticBackend,
self).build_schema(fields)
for field_name, field_class in fields.items():
field_mapping = mapping[field_class.index_fieldname]
index_analyzer = getattr(field_class, 'index_analyzer', None)
search_analyzer = getattr(field_class, 'search_analyzer', None)
field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER)
if field_mapping['type'] == 'string' and field_class.indexed:
if not hasattr(field_class, 'facet_for') and not field_class.field_type in('ngram', 'edge_ngram'):
field_mapping['analyzer'] = field_analyzer
if index_analyzer and search_analyzer:
field_mapping['index_analyzer'] = index_analyzer
field_mapping['search_analyzer'] = search_analyzer
del(field_mapping['analyzer'])
mapping.update({field_class.index_fieldname: field_mapping})
return (content_field_name, mapping)
class CustomElasticSearchEngine(ElasticsearchSearchEngine):
backend = CustomElasticBackend
# Custom field
class CustomFieldMixin(object):
def __init__(self, **kwargs):
self.analyzer = kwargs.pop('analyzer', None)
self.index_analyzer = kwargs.pop('index_analyzer', None)
self.search_analyzer = kwargs.pop('search_analyzer', None)
super(CustomFieldMixin, self).__init__(**kwargs)
class CustomEdgeNgramField(CustomFieldMixin, BaseEdgeNgramField):
pass
我的索引定义如下:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name_auto = CustomEdgeNgramField(model_attr='name', index_analyzer="edgengram_analyzer", search_analyzer="suggest_analyzer")
最后,设置当然使用了haystack连接定义的自定义后端:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'my_app.search_backends.CustomElasticSearchEngine',
'URL': 'http://localhost:9200',
'INDEX_NAME': 'index'
},
}
答案 1 :(得分:1)
好吧,我遇到了类似的问题,我的策略是定制后端。
完整说明可在以下网址找到:
http://www.wellfireinteractive.com/blog/custom-haystack-elasticsearch-backend/
对我有用!
希望这会有所帮助。