我是Elasticsearch的新手。我有一些文件可以有这样的属性:
我想将这些属性存储在一个字段中,以便用户可以使用“3张床位于97778(zip)”进行搜索。
我尝试使用单个数组字段,使[3个床位,2个浴室,97778],[7个床位,3个浴室,97778]使用禁用分析器,这样我就可以限制“at”,“in”这种单词,但似乎这不是正确的方法,因为第二个doc分数高于第一个doc。
另外,我有一个同义词分析器,因为如果用户搜索“3 bd”,它应该返回“3个床位”。
现在我的问题是存储属性的最佳方法是什么?这是我的一些虚拟文件。
{
"Beds" : 3,
"Bath" : 2,
"Zip" : 97778,
"Attributes" : ["3 beds","2 baths", "97778"]
},
{
"Beds" : 7,
"Bath" : 3,
"Zip" : 97778,
"Attributes" : [7 beds,3 baths, 97778]
}
我应该将此架构更改为
{
"Beds" : 7,
"Bath" : 3,
"Zip" : 97778,
"Attributes" : [bed : "7", bath : "3", zip : "97778"]
}
如果是这样,那么我该如何放置同义词分析器?
答案 0 :(得分:3)
第一个结构对我来说似乎更好看。我使用Marvel在本地计算机上创建了一个带有这些属性的简单索引:
PUT /test
{
"settings": {
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": "_english_"
},
"my_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"my_synonym": {
"type": "synonym",
"synonyms": [
"bd => bed",
"bt, baths, bth => bath"]
},
"my_shingle": {
"type" : "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3,
"output_unigrams": false,
"output_unigrams_if_no_shingles": true
}
},
"analyzer": {
"my_english": {
"tokenizer": "standard",
"filter": [
"my_possessive_stemmer",
"lowercase",
"my_stop",
"my_synonym",
"kstem",
"my_shingle"
]
}
}
}
},
"mappings": {
"documents": {
"properties": {
"Beds": {
"type": "integer"
},
"Baths": {
"type": "integer"
},
"Zip": {
"type": "integer"
},
"Attributes": {
"type": "string",
"analyzer": "my_english"
}
}
}
}
}
这是非常标准的英语分析器(我只排除了词干分析器,我认为它过于激进并用kstem取代)当然还有你的同义词。我还添加了shingle过滤器,它产生令牌组合,这正是我们正在寻找的!
我已经添加了您的测试数据。请注意,如果用户希望查找zip 97778或97778 zip,我已将关键字zip
加倍。
PUT /test/documents/1
{
"Beds": 3,
"Bath": 2,
"Zip": 97778,
"Attributes": ["3 beds", "2 baths", "zip 97778 zip"]
}
PUT /test/documents/2
{
"Beds": 7,
"Bath": 3,
"Zip": 97778,
"Attributes": ["7 beds", "3 baths", "zip 97778 zip"]
}
POST /test/documents/3
{
"Attributes" : ["8310 prairie rose place", "md", "baltimore", "21208", "us", "3 bd", "3 bth", "1 pbh", "1 hbh", "cooktop", "dishwasher", "dryer", "garbage disposer", "ice maker", "microwave", "oven", "oven - double", "refrigerator", "washer", "appliances", "contemporary architecture", "ceiling fan(s)", "colling system", "brick", "basement", "forced air", "heating system", "3 floors", "2 parkings", "garage", "asphalt roof"]
}
POST /test/documents/4
{
"Attributes" : ["8 winners circle", "md", "owings mills", "21117", "us", "2 bd", "1 bth", "dishwasher", "dryer", "garbage disposer", "microwave", "range", "refrigerator", "washer", "appliances", "traditional architecture", "new traditional architecture", "central a/c", "colling system", "vinyl siding", "heat pump", "heating system", "1 floors", "assigned", "unassigned", "unknown roof"]
}
这是一个简单的匹配查询:
POST /test/documents/_search
{
"query": {
"match": {
"Attributes": {
"query": "3 beds at 97778(zip)"
}
}
}
}
它根据要求提供所需的数据:
{
"_index" : "test",
"_type" : "documents",
"_id" : "1",
"_score" : 0.020668881,
"_source" : {
"Beds" : 3,
"Bath" : 2,
"Zip" : 97778,
"Attributes" : [
"3 beds",
"2 baths",
"zip 97778 zip"
]
}
},
{
"_index" : "test",
"_type" : "documents",
"_id" : "2",
"_score" : 0.004767749,
"_source" : {
"Beds" : 7,
"Bath" : 3,
"Zip" : 97778,
"Attributes" : [
"7 beds",
"3 baths",
"zip 97778 zip"
]
}
},
{
"_index" : "test",
"_type" : "documents",
"_id" : "3",
"_score" : 0.0014899216,
"_source" : {
"Attributes" : [
"8310 prairie rose place",
"md",
"baltimore",
"21208",
"us",
"3 bd",
"3 bth",
"1 pbh",
"1 hbh",
"cooktop",
"dishwasher",
"dryer",
"garbage disposer",
"ice maker",
"microwave",
"oven",
"oven - double",
"refrigerator",
"washer",
"appliances",
"contemporary architecture",
"ceiling fan(s)",
"colling system",
"brick",
"basement",
"forced air",
"heating system",
"3 floors",
"2 parkings",
"garage",
"asphalt roof"
]
}
}
现在我在查询时:
POST /test/documents/_search
{
"query": {
"match": {
"Attributes": {
"query": "2 bd and 1 bth at md"
}
}
}
}
返回此结果,这是正确的:
{
"_index" : "test",
"_type" : "documents",
"_id" : "4",
"_score" : 0.0032357208,
"_source" : {
"Attributes" : [
"8 winners circle",
"md",
"owings mills",
"21117",
"us",
"2 bd",
"1 bth",
"dishwasher",
"dryer",
"garbage disposer",
"microwave",
"range",
"refrigerator",
"washer",
"appliances",
"traditional architecture",
"new traditional architecture",
"central a/c",
"colling system",
"vinyl siding",
"heat pump",
"heating system",
"1 floors",
"assigned",
"unassigned",
"unknown roof"
]
}
}
你说你的结果总是得1分。这表明你的查询运行不正确。我猜这个问题是你在attributes
字段而不是Attributes
上运行,不幸的是,Elasticsearch非常区分大小写。
从评论中,你说你正在使用term query - 因为它一直在寻找精确的术语匹配,所以对文本数据使用它是不对的。 始终在您搜索文本数据时使用match query。
如果有帮助,请告诉我。