我使用此搜索查询:
GET videosearch/_search
{
"query": {
"match": {
"tags": "logs"
}
}
}
以便在标签字段中返回包含“logs”的所有文档。
标签字段具有以下映射:
"tags": {
"type": "string",
"analyzer": "english",
"fields": {
"verbatim": {
"type": "string",
"index": "not_analyzed"
}
}
}
查询返回的结果如下:
{
"_index": "videosearch",
"_type": "videos",
"_id": "10",
"_score": 0.792282,
"_source": {
"id": "10",
"url": "https://www.youtube.com/watch?v=yDLtyLi6Ny8",
"title": "#bbuzz: Radu Gheorghe JSON Logging with Elasticsearch",
"uploaded_by": "newthinking communications",
"upload_date": "2013-06-19",
"views": 370,
"likes": 0,
"tags": [
"elasticsearch",
"logs",
"logstash",
"rsyslog",
"json"
]
}
}
但也会返回不好的结果:
{
"_index": "videosearch",
"_type": "videos",
"_id": "15",
"_score": 0.9054651,
"_source": {
"id": "15",
"url": "https://www.youtube.com/watch?v=4L1DjY90Whk",
"title": "Tuning Solr for Logs, by Radu Gheorghe",
"uploaded_by": "Lucidworks",
"upload_date": "2015-01-07",
"views": 280,
"likes": 2,
"tags": [
"logging",
"solr",
"tuning",
"performance"
]
}
}
我认为最后一个是“坏”结果,因为它不包含tags字段中的“logs”字符串。另外我可以注意到,即使它是一个“坏”结果,它的得分也高于“好”结果:0.9054651 vs 0.792282。
发生了什么事,我错过了什么?
答案 0 :(得分:0)
经过更多的研究,我读到了有关分析器的问题,弹性搜索使用这些分析器将单词分解为标记。
英语分析器正在使用词干来构造令牌。 在下面的示例中,我将使用英语分析器将一些单词分解为搜索标记:
GET _analyze?pretty
{
"analyzer": "english",
"text": ["hair dryer", "introduction", "stars", "Introspective", "fishing", "logging"]
}
这导致以下令牌:
{
"tokens": [
{
"token": "hair",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "dryer",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "introduct",
"start_offset": 11,
"end_offset": 23,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "star",
"start_offset": 24,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "introspect",
"start_offset": 30,
"end_offset": 43,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "fish",
"start_offset": 44,
"end_offset": 51,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "log",
"start_offset": 52,
"end_offset": 59,
"type": "<ALPHANUM>",
"position": 6
}
]
}
您可以注意到,令牌实际上是每个要分析的单词的对应词。
总之, log , logs , logging 这两个词具有相同的词干 log ,所以所有这三个都是搜索结果候选人。