如何在弹性搜索中提高结果,以便field1中的匹配始终高于field2中的匹配?

时间:2013-06-10 18:39:43

标签: elasticsearch

有映射,3个字段和9个文档:

#! /bin/bash

#DELETE
curl -XDELETE 'http://localhost:9200/test'
echo
# CREATE
curl -XPUT 'http://localhost:9200/test?pretty=1' -d '{
    "settings": {
       "analysis" : {
            "analyzer" : {
               "my_analyz_1" : {
                    "filter" : [
                        "standard",
                        "lowercase",
                        "asciifolding"
                    ],
                    "type" : "custom",
                    "tokenizer" : "standard"
                }
            }
        }
    }
}'
echo
# DEFINE
curl -XPUT 'http://localhost:9200/test/posts/_mapping?pretty=1' -d '{
    "posts" : {
        "properties" : {
            "section" : {
              "type" : "string",
              "analyzer" : "my_analyz_1"
            },
            "category" : {
              "type" : "string",
              "analyzer" : "my_analyz_1"
            },
            "title" : {
              "type" : "string",
              "analyzer" : "my_analyz_1"
            }
        }
    }
}'
echo
# INSERT
curl localhost:9200/test/posts/1 -d '{section: "Bicycle", category: "Small", title: "Diamondback Grind-16"}'
curl localhost:9200/test/posts/2 -d '{section: "Bicycle", category: "Big",   title: "Diamondback JrViper"}'
curl localhost:9200/test/posts/3 -d '{section: "Bicycle", category: "Small", title: "2-Hip Cyclone small"}'
curl localhost:9200/test/posts/4 -d '{section: "Bicycle", category: "Big",   title: "2-Hip Bizzle"}'
curl localhost:9200/test/posts/5 -d '{section: "Small",   category: "Small", title: "Toyota"}'
curl localhost:9200/test/posts/6 -d '{section: "Car",     category: "Big",   title: "Subaru Impreza small"}'
curl localhost:9200/test/posts/7 -d '{section: "Small",   category: "Big",   title: "Toyota Corona MARK II"}'
curl localhost:9200/test/posts/8 -d '{section: "Car",     category: "Small", title: "Hyundai Elantra"}'
curl localhost:9200/test/posts/9 -d '{section: "Car",     category: "Big",   title: "Ford Maverick small"}'
echo
# REFRESH
curl -XPOST localhost:9200/test/_refresh
echo

我想搜索“小”这个词,但我想要结果的命令如下:

  1. 结果的部分
  2. 结果“小”属于类别
  3. 结果“小”标题为
  4. 所以我用查询搜索:

    curl "localhost:9200/test/posts/_search?pretty=1" -d '{
        "query": {
            "bool": {
                "must": [
                    {
                        "multi_match": {
                            "query": "small",
                            "fields": ["section^3", "category^2", "title"]
                        }
                    }
                ]
            }
        }
    }'
    

    结果是:

    {"_id": 7} {section: "Small",   category: "Big",   title: "Toyota Corona MARK II"}
    {"_id": 1} {section: "Bicycle", category: "Small", title: "Diamondback Grind-16"}
    {"_id": 5} {section: "Small",   category: "Small", title: "Toyota"}
    {"_id": 3} {section: "Bicycle", category: "Small", title: "2-Hip Cyclone small"}
    {"_id": 8} {section: "Car",     category: "Small", title: "Hyundai Elantra"}
    {"_id": 9} {section: "Car",     category: "Big",   title: "Ford Maverick small"
    {"_id": 6} {section: "Car",     category: "Big",   title: "Subaru Impreza small"}
    

    这不是我想要的。 5应该是第二个,因为匹配在该部分中。 3应该在7和5之后,因为匹配在类别和标题中。

    所以,我的问题是,如何获得结果,其中匹配部分总是更重要,然后匹配类别,始终比匹配更重要标题。

    提前致谢!

    修改

    搜索类型'dfs_query_then_fetch'解决了问题,该搜索类型计算所有分片中的TF-IDF值。有关详情,请参阅http://www.elasticsearch.org/guide/reference/api/search/search-type/

1 个答案:

答案 0 :(得分:0)

您是否尝试将use_dis_max设为false

这应该意味着categorytitle中“小”的文档高于category中“小”的文档。

至于你在第二和第三个结果之间看到的奇怪行为,我有点迷失...你能做查询并要求 explanation of how the scores were calculated