ElasticSearch查询/搜索/匹配

时间:2013-05-21 17:37:21

标签: elasticsearch

我在ElasticSearch索引中插入了3条记录,如下所示:

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "w bridgewater",
    "raw_name" : "W BRIDGEWATER"
  },
  { "language" : "ENG",
    "name" : "west bridgewater",
    "raw_name" : "West Bridgewater"
  }
],
"id" : 1,
  "streetNames" : [ { "language" : "ENG",
    "name" : "cram rd",
    "raw_name" : "Cram Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater corners",
    "raw_name" : "BRIDGEWATER CORNERS"
  },
  { "language" : "ENG",
    "name" : "bridgewater center",
    "raw_name" : "Bridgewater Center"
  }
],
"id" : 2,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater",
    "raw_name" : "Bridgewater"
  },
  { "language" : "ENG",
    "name" : "windsor",
    "raw_name" : "Windsor"
  }
],
"id" : 3,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'

我按如下方式进行搜索:

curl -XGET 'http://127.0.0.1:9200/geoindex_test/STREET/_search?pretty=1'  -d '
{
"query" : {
    "match" : { "cityNames.name" : "bridgewater" }
}
}'

我认为ElasticSearch会返回第三条记录(id == 3)作为最佳匹配(记录3是唯一与“bridgewater”匹配的完全匹配),而是返回id 1(w bridgewater)的记录作为最佳匹配。我做错了什么?

1 个答案:

答案 0 :(得分:1)

我想这种情况正在发生,因为您正在使用内部对象,它基本上将其下的对象折叠成一个用于搜索目的。例如,当您查询对象1的搜索字段时,您正在查询[“w bridgewater”,“west bridgewater”]而不是您想象的离散字段。

由于'bridgewater'在对象1和2(两个名称字段)中出现两次而在对象3中出现一次,因此这些项目在搜索中排名较高。对象1最终被选中,因为'bridgewater'出现的字段比对象2中的字符串短(“w bridgewater”对比“bridgewater corner”)。

不使用像你一样的内部对象,而是使用嵌套对象http://www.elasticsearch.org/guide/reference/mapping/nested-type/。将分数模式设置为“max”将使事情以更直观的方式与您匹配。