Question

我只想在某些“已分析”字段上使用tf-idf分数，并在“未分析”字段上使用“ term”来整理首选结果。但是结果却不如我预期。

根据官方文件，不会分析'not_analyzed'字段，我认为这是因为es不会在这些字段上计算得分。因此，我想利用这一点来整理我想要的东西，因为我想在特定字段上使用tf-idf分数来进行更多计算，但是当添加词条条件时，分数会有所不同。我尝试了3个步骤： 1.在被分析的领域做“比赛”，那是我想要的分数 2.将not_analyzed字段上的“ match”和“ term”串联起来，但是返回的分数比第一步高 3.仅对“ not_analyzed”字段执行“ term”，并返回分数。

部分代码如下所示，这些是4个数据条目：

data = {“ did”：1，“ title”：“ hu la la”，“ test”：[“ a”，“ b”，“ c”]}

data = {“ did”：2，“ title”：“ hu la”，“ test”：[“ a”，“ b”，“ c”]}

data = {“ did”：3，“ title”：“ hu la la”，“ test”：[“ a”，“ b”]}

data = {“ did”：4，“ title”：“ la la”，“ test”：[“ a”，“ b”，“ c”]}

mappings = {
    "properties": {
        "did": {"type": "long", "index": "not_analyzed"},
        "title": {"type": "string", "index": "analyzed"},
        "test": {"type": "string", "index": "not_analyzed"},
    }
}

curl -X GET http://localhost:9200/test7/_search?pretty=true -d '
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "title": "la"
                    }
                }
            ]
        }
    }
}
'

其中之一是

{
      "_index" : "test7",
      "_type" : "default",
      "_id" : "AWoRGrIx5vn17yswf0rR",
      "_score" : 0.4203996,
      "_source" : {
        "did" : 1,
        "test" : [ "a", "b", "c" ],
        "title" : "hu la la"
      }

但是当我添加术语

时

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "title": "la"
                    }
                },
                {
                    "term": {
                        "test": "a"
                    }
                }
            ]
        }
    }
}
'

它的分数改变了！

{
      "_index" : "test7",
      "_type" : "default",
      "_id" : "AWoRGrIx5vn17yswf0rR",
      "_score" : 0.7176671,
      "_source" : {
        "did" : 1,
        "test" : [ "a", "b", "c" ],
        "title" : "hu la la"
      }

Answer 1

您应该使用filter查询来过滤结果，这不会影响得分。

示例：

 {
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "title": "la"
                    }
                }               
            ],
            "filter": [
                 {
                    "term": {
                        "test": "a"
                    }
                }
            ]
        }
    }
}

为什么我用not_analyzed字段进行“术语查询”，并且es仍返回分数？

1 个答案: