弹性搜索在过滤器中添加权重或提升以查询和多个术语子句

时间:2017-01-18 05:18:11

标签: elasticsearch elasticsearch-2.0

我的示例索引和文档结构如下所示:

 http://localhost:9200/testindex/
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "tokenizer": "whitespace",
              "filter": [
                "lowercase",
                "autocomplete"
              ]
            },
            "autocomplete_search": {
              "tokenizer": "whitespace",
              "filter": [
                "lowercase"
              ]
            }
          },
          "filter": {
            "autocomplete": {
              "type": "nGram",
              "min_gram": 2,
              "max_gram": 40
            }
          }
        }
      },
      "mappings": {
        "table1": {
          "properties": {
            "title": {
              "type": "string",
              "index": "not_analyzed"
            },
            "type": {
              "type": "string",
              "index": "not_analyzed"
            },
            "type1": {
              "type": "string",
              "index": "not_analyzed"
            },
            "id": {
              "type": "string",
              "analyzer": "autocomplete",
              "search_analyzer": "autocomplete_search"
            }
          }
        }
      }
    }



http://localhost:9200/testindex/table1/1
{
  "title": "mumbai",
  "type": "efg",
  "type1": "efg1",
  "id": "Caryle management"
}


http://localhost:9200/testindex/table1/2
{
  "title": "canada",
  "type": "abc",
  "type1": "abc1",
  "id": "labson series 2014"
}



http://localhost:9200/testindex/table1/3/
{
  "title": "ny",
  "type": "abc",
  "type1": "abc1",
  "id": "labson series 2012"
}


http://localhost:9200/testindex/table1/4/
{
  "title": "pune",
  "type": "abc",
  "type1": "abc1",
  "id": "hybrid management"
}




Query used to get all documents where type = "abc" and "efg" and have id equal to labson and management .


 {
      "query": {
        "bool": {
          "filter": {
            "query": {
              "terms": {
                "type": [
                  "abc",
                  "efg"
                ]
              }
            }
          },
          "minimum_should_match": 1,
          "should": [
            {
              "query": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "_type": "table1"
                      }
                    },
                    {
                      "bool": {
                        "should": [
                          {
                            "match": {
                              "id": {
                                "query": "labson ",
                                "operator": "and"
                              }
                            }
                          },
                          {
                            "match": {
                              "id": {
                                "query": "management",
                                "operator": "and"
                              }
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }






    "hits": [
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "2",
    "_score": 1,
    "_source": {
    "title": "canada",
    "type": "abc",
    "type1": "abc1",
    "id": "labson series 2014"
    }
    }
    ,
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "4",
    "_score": 1,
    "_source": {
    "title": "pune",
    "type": "abc",
    "type1": "abc1",
    "id": "hybrid management"
    }
    }
    ,
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "1",
    "_score": 1,
    "_source": {
    "title": "mumbai",
    "type": "efg",
    "type1": "efg1",
    "id": "Caryle management"
    }
    }
    ,
    {
    "_index": "testindex",
    "_type": "table1",
    "_id": "3",
    "_score": 1,
    "_source": {
    "title": "ny",
    "type": "abc",
    "type1": "abc1",
    "id": "labson series 2012"
    }
    }
    ]

所以我需要帮助解决此输出中的问题。

  1. 为什么将labson系列2012作为结果中的最后一个文档 ?虽然我的搜索标准想要首先看看labson和 那么管理。我怎样才能增加或加权labson关键字 过度管理。所以输出应该给我所有文件 匹配labson,然后根据输入的顺序进行管理 匹配条款。
  2. 如何在顶部添加一个应该读取的过滤器,给我全部 在(“abc”,“efg”)和type1中输入类型的文档 (“abc”)。现在我只搜索输入(“abc”,“efg”),如何 我可以修改查询以包含type1字段的IN子句。
  3. 请为上述2个查询解决方案提供一些伪代码,因为我是ES的新手,这对我有很大的帮助

    提前致谢

1 个答案:

答案 0 :(得分:0)

我想清楚你这个“虽然我的搜索标准想要首先看看labson然后管理”。 Elasticsearch在生成分数时不考虑查询子句的顺序。分数由每个子查询子句独立于订单生成,然后将它们全部组合在一起以评估最终得分。

请参阅以下查询以了解您的用例。 对于分数计算,您可以添加boost param in match query选项以在发生匹配时增加文档的分数。我使用custom score query忽略tdf /频率。要忽略对socring的查询规范效果,您可以在索引文档时关闭querynorm。请使用以下映射到turn off querynorm

 {
        "settings": {
            "analysis": {
                "analyzer": {
                    "autocomplete": {
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase",
                            "autocomplete"
                        ]
                    },
                    "autocomplete_search": {
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase"
                        ]
                    }
                },
                "filter": {
                    "autocomplete": {
                        "type": "nGram",
                        "min_gram": 2,
                        "max_gram": 40
                    }
                }
            }
        },
        "mappings": {
            "table1": {
                "properties": {
                    "title": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "type": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "type1": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "id": {
                        "type": "string",
                        "analyzer": "autocomplete",
                        "search_analyzer": "autocomplete_search",
                        "norms": {
                            "enabled": false
                        }
                    }
                }
            }
        }
    }

Few discussion thread for similar scoring usecases.

Github issue for query norm

由于您还提到过您需要在("abc" , "efg") and type1 in ("abc")上方设置过滤器。所以我添加了一个带有两个子过滤器术语和术语的必须过滤器来支持这一点。

{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [{
                        "constant_score": {
                            "query": {
                                "match": {
                                    "id": {
                                        "query": "management",
                                        "operator": "and"
                                    }
                                }
                            },
                            "boost": 1
                        }
                    }, {
                        "constant_score": {
                            "query": {
                                "match": {
                                    "id": {
                                        "query": "labson",
                                        "operator": "and"
                                    }
                                }
                            },
                            "boost": 2
                        }
                    }],
                    "must": [{
                        "term": {
                            "type1": {
                                "value": "abc"
                            }
                        }
                    }, {
                        "terms": {
                            "type": [
                                "abc",
                                "efg"
                            ]
                        }
                    }]
                }
            }
        }
    }
}

鉴于您对此过滤器("abc" , "efg") and type1 in ("abc")的要求,实际上没有符合此条件的文档,因此如果您在上述4个文档上运行此查询,则匹配将为0。如果要将and子句更改为OR子句,可以通过对查询进行适当更改来进行更改。

通过将不同的助推参数添加到多个匹配查询并期望通过组合每个匹配查询的每个分数来评估分数,您可以通过评分获得更多。

希望这适合你。 感谢