多个单词在搜索中充当单个单词-Elasticsearch

时间:2019-01-06 04:27:38

标签: elasticsearch

我遇到了一个问题,例如social mediatwo wordstag with many spaces这样的标签在搜索查询中每个单词的分数都高。

如何搜索two words作为一个单词,而不是在搜索twotwo words时获得不同的分数

以下是当前结果得分的直观表示:

+-----------------------+-------+
| search                | score |
+-----------------------+-------+
| two                   | 2.76  |
| two words             | 5.53  |
| tag with many spaces  | 11.05 |
| singleword            | 2.76  |

以下是我想要的图像:

+-----------------------+-------+
| search                | score |
+-----------------------+-------+
| two                   | 2.76  |
| two words             | 2.76  |
| tag with many spaces  | 2.76  |
| singleword            | 2.76  |

每个文档中都有多个标签。每个标签搜索都用PHP的逗号,分解,并像下面的查询一样输出

假设一个文档具有多个标签,包括two wordssingleword,这就是搜索查询:

"query": {
    "function_score": {
        "query": {
            "bool": {
                "should": [
                    {
                        "match": {
                            "tags.name": "two words"
                        }
                    },
                    {
                        "match": {
                            "tags.name": "singleword"
                        }
                    }
                ]
            }
        },
        "functions": [
            {
                "field_value_factor": {
                    "field": "tags.votes"
                }
            }
        ],
        "boost_mode": "multiply"
    }
}

如果搜索two而不是two words,则得分会有所不同

这是搜索two words

时的结果
{
    "_index": "index",
    "_type": "type",
    "_id": "u10q42cCZsbFNf1W0Tdq",
    "_score": 4.708793,
    "_source": {
        "url": "example.com",
        "title": "title of the document",
        "description": "some description of the document",
        "popularity": 9,
        "tags": [
            {
                "name": "two words",
                "votes": 1
            },
            {
                "name": "singleword",
                "votes": 1
            },
            {
                "name": "othertag",
                "votes": 1
            },
            {
                "name": "random",
                "votes": 1
            }
        ]
    }
}

以下是搜索two而不是two words

时的结果
{
    "_index": "index",
    "_type": "type",
    "_id": "u10q42cCZsbFNf1W0Tdq",
    "_score": 3.4481666,
    "_source": {
        "url": "example.com",
        "title": "title of the document",
        "description": "some description of the document",
        "popularity": 9,
        "tags": [
            {
                "name": "two words",
                "votes": 1
            },
            {
                "name": "singleword",
                "votes": 1
            },
            {
                "name": "othertag",
                "votes": 1
            },
            {
                "name": "random",
                "votes": 1
            }
        ]
    }
}

以下是映射(专门用于标记)

"tags": {
  "type": "nested",
  "include_in_parent": true,
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "votes": {
      "type": "long"
    }
  }
}

我尝试使用"\"two words\"""*two words*"进行搜索,但是没有区别。

有可能实现这一目标吗?

1 个答案:

答案 0 :(得分:1)

您应使用未分析的字符串进行匹配,然后切换到术语查询。

您可以尝试:

var myLineBreak = message2.replace(/\r\n|\r|\n/g, "%0D%0A");

在您的实际实现中,当您对查询“两个单词”进行"query": { "function_score": { "query": { "bool": { "should": [ { "term": { "tags.name.keyword": "two words" } }, { "term": { "tags.name.keyword": "singleword" } } ] } }, "functions": [ { "field_value_factor": { "field": "tags.votes" } } ], "boost_mode": "multiply" } } 查询时,它将分析您的查询以在标记中搜索标记“两个”和“单词”。因此,带有“两个单词”标签的文档将与两个标记匹配并得到增强。