我正在使用elasticsearch-6.4.3。我创建了一个索引Array.split
flight-location_methods
上面的摘录来自我为索引创建的 settings index: {
analysis: {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete_filter"]
}
}
}
}
mapping do
indexes :airport_code, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
indexes :airport_name, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
indexes :city_name, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
indexes :country_name, type: "text", analyzer: "autocomplete", search_analyzer: "standard"
end
的ruby代码。
当我执行此查询时:
represents the mapping
我得到这个结果:
GET /flight-location_methods/_search
{
"from": 0,
"size": 1000,
"query": {
"function_score": {
"functions": [
{
"filter": {
"match": {
"city_name": "new yo"
}
},
"weight": 50
},
{
"filter": {
"match": {
"country_name": "new yo"
}
},
"weight": 50
}
],
"max_boost": 200,
"score_mode": "max",
"boost_mode": "multiply",
"min_score": 10
}
}
}
您可以看到 {
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "tcoj1G0Bdo5Q9AduxCKi",
"_score": 50,
"_source": {
"airport_name": "Ouvea",
"airport_code": "UVE",
"city_name": "Ouvea",
"country_name": "New Caledonia"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "zMoj1G0Bdo5Q9AduxCKi",
"_score": 50,
"_source": {
"airport_name": "Palmerston North",
"airport_code": "PMR",
"city_name": "Palmerston North",
"country_name": "New Zealand"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "1Moj1G0Bdo5Q9AduxCKi",
"_score": 50,
"_source": {
"airport_name": "Westport",
"airport_code": "WSZ",
"city_name": "Westport",
"country_name": "New Zealand"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "1coj1G0Bdo5Q9AduxCKi",
"_score": 50,
"_source": {
"airport_name": "Whangarei",
"airport_code": "WRE",
"city_name": "Whangarei",
"country_name": "New Zealand"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "Rsoj1G0Bdo5Q9AduxCOi",
"_score": 50,
"_source": {
"airport_name": "Municipal",
"airport_code": "RNH",
"city_name": "New Richmond",
"country_name": "United States"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "fsoj1G0Bdo5Q9AduxCOi",
"_score": 50,
"_source": {
"airport_name": "New London",
"airport_code": "GON",
"city_name": "New London",
"country_name": "United States"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "gMoj1G0Bdo5Q9AduxCOi",
"_score": 50,
"_source": {
"airport_name": "New Ulm",
"airport_code": "ULM",
"city_name": "New Ulm",
"country_name": "United States"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "5coj1G0Bdo5Q9AduxCSi",
"_score": 50,
"_source": {
"airport_name": "Cape Newenham",
"airport_code": "EHM",
"city_name": "Cape Newenham",
"country_name": "United States"
}
},
{
"_index": "flight-location_methods",
"_type": "_doc",
"_id": "Ycoj1G0Bdo5Q9AduxCWi",
"_score": 50,
"_source": {
"airport_name": "East 60th Street H/P",
"airport_code": "JRE",
"city_name": "New York",
"country_name": "United States"
}
}
,但实际上不是。
我也New York should be on top
,因为如果搜索文本中有多个单词,我希望搜索文本中的任何单词都出现在任何字段中。但是,如果所有搜索文本都在一个字段中,则优先级应该更高。
答案 0 :(得分:2)
让我们首先讨论Elasticsearch标记化器和标记化过程:
令牌生成器接收字符流,将其分解为单个令牌(通常是单个单词)。 ES docs
现在让我们描述如何自动完成分析器工作:
从这里开始魔术:我认为您对1到20的令牌的定义太多了。可能存在包含10个以上字符的单词,但对于我们而言,这是不相关的。同样,仅包含一个对我们不可用的字符的令牌。我更改了:
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5
}
}
然后在我们的索引中将包含很多单词部分,长度从2到5个字符。现在,当我们知道要搜索的内容时,就可以创建映射并编写查询:
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"airport_name": {
"type": "text",
"fields": {
"ngram": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
"airport_code": {
"type": "keyword",
"fields": {
"ngram": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
"city_name": {
"type": "keyword",
"fields": {
"ngram": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
"country_name": {
"type": "keyword",
"fields": {
"ngram": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
}
}
}
我使用ngram字段和常规字段创建字段,以保持进行聚合的能力。例如,通过多个机场查找城市是很好的选择。
现在我们可以运行一个简单的查询来获取纽约:
{
"size": 20,
"query": {
"query_string": {
"default_field": "city_name.ngram",
"query": "new yo",
"default_operator": "AND"
}
}
}
Answer
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 13.896059,
"hits": [
{
"_index": "test-index",
"_type": "_doc",
"_id": "BtBD2W0BCDulLSY6pKM8",
"_score": 13.896059,
"_source": {
"airport_name": "Flushing",
"airport_code": "FLU",
"city_name": "New York",
"country_name": "United States"
}
}
]
}
}
或通过增强功能创建boosting或text查询。在大数据列表中进行查询也将更加有效。
您的查询应为:
{
"query": {
"function_score": {
"query": {
"query_string": {
"query": "new yo",
"analyzer": "autocomplete"
}
},
"functions": [
{
"filter": {"terms": {
"city_name.ngram": [
"new",
"yo"
]
}},
"weight": 2
},
{
"filter": {"terms": {
"country_name.ngram": [
"new",
"yo"
]
}},
"weight": 2
}
],
"max_boost": 30,
"min_score": 5,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
在此查询中,纽约将是第一个,因为我们通过查询部分过滤了所有不相关的文档。并乘以2 city_name.ngram字段分数,在此字段中,我们有2个令牌,那么此字段将获得最高分数。同样,查询的底线是min_score,它过滤而不是相关文档。您可以了解当前的Elasticsearch相关算法here。 顺便说一句,我不想在相同权重的函数中放置过滤器。您应该决定是否是更重要的领域。这使您的搜索更加清晰。