我的要求是:
如果我将多个单词作为列表传递给搜索,ES将返回带有单词匹配子集的文档以及匹配的单词。所以我可以理解哪个文档匹配哪个子集。
假设我需要搜索足球,板球,网球,高尔夫等单词。 在三个文件中
我将这些文件存储在相应的文档中。 “mydocuments”索引的映射如下所示:
{
"mydocuments" : {
"mappings" : {
"docs" : {
"properties" : {
"file_content" : {
"type" : "string"
}
}
}
}
}
}
第一份文件
{ _id: 1, file_content: "I love tennis and cricket"}
第二份文件:
{ _id: 2, file_content: "tennis and football are very popular"}
第三份文件:
{ _id: 3, file_content: "football and cricket are originated in england"}
我应该可以搜索单个文件/或多个文件,用于足球,网球, 板球,高尔夫,它应该返回这样的东西
像这样的东西
"hits":{
"total" : 3,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"file_content" : ["football","cricket"],
"postDate" : "2009-11-15T14:12:12",
}
},
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "2",
"_source" : {
"file_content" : ["football","tennis"],
"postDate" : "2009-11-15T14:12:12",
}
}
]
或者在多个文件搜索的情况下,上面的搜索结果数组
任何想法我们如何使用Elasticsearch做到这一点?
如果使用elasticsearch无法做到这一点,我准备评估任何其他选项(Native lucene,Solr)
修改
我的不好可能是我没有提供足够的细节。 @Andrew我所说的文件是ES中文档中存储为字符串字段(全文)的文件的文本内容。假设一个文件对应于一个名为“file_content”的字段中包含文本内容字符串的文档。
答案 0 :(得分:1)
你最接近你想要的是highlighting,意思是强调文件中搜索的术语。
示例查询:
{
"query": {
"match": {
"file_content": "football tennis cricket golf"
}
},
"highlight": {
"fields": {"file_content":{}}
}
}
结果:
"hits": { "total": 3, "max_score": 0.027847305, "hits": [ { "_index": "test_highlight", "_type": "docs", "_id": "1", "_score": 0.027847305, "_source": { "file_content": "I love tennis and cricket" }, "highlight": { "file_content": [ "I love <em>tennis</em> and <em>cricket</em>" ] } }, { "_index": "test_highlight", "_type": "docs", "_id": "2", "_score": 0.023869118, "_source": { "file_content": "tennis and football are very popular" }, "highlight": { "file_content": [ "<em>tennis</em> and <em>football</em> are very popular" ] } }, { "_index": "test_highlight", "_type": "docs", "_id": "3", "_score": 0.023869118, "_source": { "file_content": "football and cricket are originated in england" }, "highlight": { "file_content": [ "<em>football</em> and <em>cricket</em> are originated in england" ] } } ] }
正如您所看到的,在特殊的<em>
部分下突出显示了找到的字词(highlight
标记所包围的元素)。