字符串字段中的Elasticsearch搜索结果的顺序或相关性不正确

时间:2016-03-12 01:21:11

标签: elasticsearch

我正在使用Elasticsearch 1.7.4版本。

我正在尝试在Title字段中搜索title个关键字,如下所示。

ES

curl -XGET "http://127.0.0.1:9203/_search?post_dev" -d'
{
  "query": {
    "match": {
      "title": {
        "query": "Title"
      }
    }
  },
  "from": "0",
  "size": "10"
}'

结果在这里:http://pastebin.com/tkd3KKN7

我将下面的表编入索引,只显示特定的搜索结果。根据我们在下面看到的内容,至少应该出现 13 16 17 19 行以上ES查询结果的顶部,除非我遗漏了什么。添加"sort":[{"_score":"desc"}]不会改变任何内容。我的索引或查询有问题吗?

的MySQL

mysql> SELECT * FROM post WHERE title LIKE '%Title%' LIMIT 10;
+----+---------+-------------+---------------+------+-------+--------------+---------------------+
| id | title   | description | author        | year | price | is_published | created_at          |
+----+---------+-------------+---------------+------+-------+--------------+---------------------+
|  2 | Title 1 | Desc 1      | Pacino        | 2015 |  2.50 |            1 | 2016-03-11 23:36:33 |
|  3 | Title 1 | Cript A     | DeNiro        | 2010 |  1.00 |            0 | 2016-03-11 23:36:33 |
|  9 | Title 3 | Desc 2      | Al            | 2010 |  0.50 |            1 | 2016-03-11 23:36:33 |
| 10 | Title 1 | Cript A     | Andy Garcia   | 2015 |  0.50 |            1 | 2016-03-11 23:36:33 |
| 12 | Title 2 | Desc 1      | Andy Garcia   | 2015 |  4.00 |            1 | 2016-03-11 23:36:33 |
| 13 | Title   | Cript       | Robert        | 2010 |  3.99 |            0 | 2016-03-11 23:36:33 |
| 16 | Title   | Title       | Andy Garcia   | 2005 |  1.00 |            1 | 2016-03-11 23:36:33 |
| 17 | Title 2 | Title       | Robert DeNiro | 2005 |  4.00 |            0 | 2016-03-11 23:36:33 |
| 19 | Title   | Cript B     | DeNiro        | 2010 |  3.99 |            1 | 2016-03-11 23:36:33 |
| 24 | Title   | Cript B     | Robert DeNiro | 2000 |  2.50 |            1 | 2016-03-11 23:36:33 |
+----+---------+-------------+---------------+------+-------+--------------+---------------------+
10 rows in set (0.00 sec)

索引

$ curl -X GET 127.0.0.1:9203/post_dev?pretty
{
  "post_dev" : {
    "aliases" : { },
    "mappings" : {
      "post" : {
        "_meta" : {
          "model" : "Application\\SearchBundle\\Entity\\Post"
        },
        "properties" : {
          "created_at" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "description" : {
            "type" : "string",
            "analyzer" : "english",
            "fields" : {
              "raw" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }
          },
          "id" : {
            "type" : "integer"
          },
          "is_published" : {
            "type" : "boolean"
          },
          "price" : {
            "type" : "double"
          },
          "title" : {
            "type" : "string",
            "analyzer" : "english",
            "fields" : {
              "raw" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }
          },
          "year" : {
            "type" : "integer"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1457739680793",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "1070499"
        },
        "uuid" : "mxQFFQ0EROuCDZvUjIj0-w"
      }
    },
    "warmers" : { }
  }
}

1 个答案:

答案 0 :(得分:1)

正如你所看到的,有2206个匹配并且它们都具有相同的分数(1.8570213),因此在这种情况下它们只是以与它们各自的Lucene段相同的顺序返回(直到索引,更新,删除等时发生段合并。

您的SQL表自然按id排序。因此,如果您将查询更改为也按ID排序,您将首先看到文档13,16,19和24,就像在SQL表中一样:

curl -XGET "http://127.0.0.1:9203/_search?post_dev" -d'
{
  "query": {
    "match": {
      "title": {
        "query": "Title"
      }
    }
  },
  "from": "0",
  "size": "10".
  "sort": {"id": "asc"}
}'