Elasticsearch模糊查询忽略了提升因子?

时间:2015-02-21 14:18:05

标签: elasticsearch fuzzy-search

当我运行此查询时:

GET /index_for_test/_search
{
    "query": {
        "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
        }
    }
}

它显示了这个结果:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.04012554,
      "hits": [
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1269493995",
            "_score": 0.04012554,
            "_source": {
               "name": "Bono Italian Restaurant",
               "categories": [
                  "Pizza"
               ]
            }
         },
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "2017788160",
            "_score": 0.014542127,
            "_source": {
               "name": "Pizza Perperook",
               "categories": [
                  "Italian Food"
               ]
            }
         }
      ]
   }
}

但是当我在这个查询中添加模糊性时:

GET /index_for_test/_search
{
    "query": {
        "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
            "fuzziness":2
        }
    }
}

它将忽略提升因子并显示此结果:

{
   "took": 28,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.095891505,
      "hits": [
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "2017788160",
            "_score": 0.095891505,
            "_source": {
               "name": "Pizza Perperook",
               "categories": [
                  "Italian Food"
               ]
            }
         },
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1269493995",
            "_score": 0.076713204,
            "_source": {
               "name": "Bono Italian Restaurant",
               "categories": [
                  "Pizza"
               ]
            }
         }
      ]
   }
}

当我两次提升名字字段时(通过使用名称^ 2作为字段),它应该显示与第一个查询相同的结果,但它似乎忽略了提升因子。

我使用其他类型的查询(query_string,fuzzy_like_this)并遇到同样的问题。

编辑:

GET /index_for_test/_search?explain=true
{
    "query": {
        "multi_match": {
            "query":       "پیتزا",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ]
        }
    }
}
使用?explain = true:

进行模糊搜索的结果
{
   "took": 25,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.05015693,
      "hits": [
         {
            "_shard": 1,
            "_node": "ZTZ37EpAR1W9e4Qqwk0O5Q",
            "_index": "index_for_test",
            "_type": "business",
            "_id": "2017788160",
            "_score": 0.05015693,
            "_source": {
               "name": "پیتزا پرپروک",
               "categories": [
                  "غذای ایتالیایی"
               ]
            },
            "_explanation": {
               "value": 0.05015693,
               "description": "product of:",
               "details": [
                  {
                     "value": 0.10031386,
                     "description": "sum of:",
                     "details": [
                        {
                           "value": 0.10031386,
                           "description": "weight(name:پیتزا^2.0 in 0) [PerFieldSimilarity], result of:",
                           "details": [
                              {
                                 "value": 0.10031386,
                                 "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                                 "details": [
                                    {
                                       "value": 0.5230591,
                                       "description": "queryWeight, product of:",
                                       "details": [
                                          {
                                             "value": 2,
                                             "description": "boost"
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.8522964,
                                             "description": "queryNorm"
                                          }
                                       ]
                                    },
                                    {
                                       "value": 0.19178301,
                                       "description": "fieldWeight in 0, product of:",
                                       "details": [
                                          {
                                             "value": 1,
                                             "description": "tf(freq=1.0), with freq of:",
                                             "details": [
                                                {
                                                   "value": 1,
                                                   "description": "termFreq=1.0"
                                                }
                                             ]
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.625,
                                             "description": "fieldNorm(doc=0)"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0.5,
                     "description": "coord(1/2)"
                  }
               ]
            }
         },
         {
            "_shard": 2,
            "_node": "ZTZ37EpAR1W9e4Qqwk0O5Q",
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1269493995",
            "_score": 0.023267403,
            "_source": {
               "name": "رستوران ایتالیایی بونو",
               "categories": [
                  "پیتزا"
               ]
            },
            "_explanation": {
               "value": 0.023267403,
               "description": "product of:",
               "details": [
                  {
                     "value": 0.046534806,
                     "description": "sum of:",
                     "details": [
                        {
                           "value": 0.046534806,
                           "description": "weight(categories:پیتزا in 0) [PerFieldSimilarity], result of:",
                           "details": [
                              {
                                 "value": 0.046534806,
                                 "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                                 "details": [
                                    {
                                       "value": 0.15165187,
                                       "description": "queryWeight, product of:",
                                       "details": [
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.49421698,
                                             "description": "queryNorm"
                                          }
                                       ]
                                    },
                                    {
                                       "value": 0.30685282,
                                       "description": "fieldWeight in 0, product of:",
                                       "details": [
                                          {
                                             "value": 1,
                                             "description": "tf(freq=1.0), with freq of:",
                                             "details": [
                                                {
                                                   "value": 1,
                                                   "description": "termFreq=1.0"
                                                }
                                             ]
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 1,
                                             "description": "fieldNorm(doc=0)"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0.5,
                     "description": "coord(1/2)"
                  }
               ]
            }
         },
         {
            "_shard": 3,
            "_node": "ZTZ37EpAR1W9e4Qqwk0O5Q",
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1203656733",
            "_score": 0.023267403,
            "_source": {
               "name": "چمن",
               "categories": [
                  "پیتزا"
               ]
            },
            "_explanation": {
               "value": 0.023267403,
               "description": "product of:",
               "details": [
                  {
                     "value": 0.046534806,
                     "description": "sum of:",
                     "details": [
                        {
                           "value": 0.046534806,
                           "description": "weight(categories:پیتزا in 0) [PerFieldSimilarity], result of:",
                           "details": [
                              {
                                 "value": 0.046534806,
                                 "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                                 "details": [
                                    {
                                       "value": 0.15165187,
                                       "description": "queryWeight, product of:",
                                       "details": [
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.49421698,
                                             "description": "queryNorm"
                                          }
                                       ]
                                    },
                                    {
                                       "value": 0.30685282,
                                       "description": "fieldWeight in 0, product of:",
                                       "details": [
                                          {
                                             "value": 1,
                                             "description": "tf(freq=1.0), with freq of:",
                                             "details": [
                                                {
                                                   "value": 1,
                                                   "description": "termFreq=1.0"
                                                }
                                             ]
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 1,
                                             "description": "fieldNorm(doc=0)"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0.5,
                     "description": "coord(1/2)"
                  }
               ]
            }
         }
      ]
   }
}

2 个答案:

答案 0 :(得分:1)

Boost不会被忽略......你只是在分数中添加一个模糊组件,这会改变整体排序。如果您使用?explain=true运行查询,您将获得有关如何构建分数的调试转储。

使用您的第一个查询,需要完全匹配。结合most_fields,得分相对简单:找到在大多数字段中具有最精确匹配的文档。

您的第二个查询通过两次编辑引入模糊性。这意味着两个字符编辑内的任何单词都将匹配。这可以大大改变匹配令牌的数量。

如果您发布explain调试输出,我可以帮助分析它以给您更清​​楚的解释,但基本上答案是:提升仍然有效,您的分数因模糊匹配而改变。

答案 1 :(得分:1)

正如Zach所说,我将查询更改为此以实现我的结果:

GET /index_for_test/_search
{
    "query": {
      "bool": {
        "should": [
          {
            "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
            "boost":10
          }
          },
          {
            "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
            "fuzziness":2
          }
          }
        ]
      }
    }
}