我有一个包含很多术语的字段的索引(它是一个生物,所以它有专业,教育,爱好等信息)。我想使用Elasticsearch来查询给定类似的bios。
我正在使用匹配查询。它给了我一些好的结果,但我不确定这是最好的方法。
GET /jdbc/_search?pretty
{
"query": {
"match": {
"bio": {
"query": "Jack Reacher, 42, is Vice President of Contoso since 2009. Mr. Reacher is responsible for operations, business, accounting, couching, CEO, worldwide, success, government, experience, MBA, CIO, North America. Previously Mr. Reacher worked as Manager of Operations at ABC Inc. from January 2003 to October 2009. Mr. Reacher holds a Bachelor of Business Administration degree from the University of Michigan and enjoy spending his weekends with his family and friends. His passion beisdes his family is music and his Porsches."
}
}
},
"size": 20
}
这是最好的方式吗?也许尝试与另一个查询结合可以提供更精确的匹配?
我无意在应用程序上使用它,因为在不同的字段中分离此信息会容易得多。这是为了帮助我在数据库中找到重复的项目;他们有点不同,但他们是同一个人。
答案 0 :(得分:1)
这种方法很好,只要你的BIOS不会太大(1024个术语)。您可能会在elasticsearch.yml中修改此值并重新启动:
indices.query.bool.max_clause_count
最终会在所有条款中执行OR相关性查询,因此最终可能会在结果集的底部显示不相关的文档。
还有其他选项可能会返回更相关的结果。例如,查看更多喜欢此查询: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-more-like-this.html#search-more-like-this
此外,草率短语查询可能会被使用: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase