我正在使用此查询在字段中搜索短语的出现次数。
"query": {
"match_phrase": {
"content": "my test phrase"
}
}
我需要计算每个文档的每个短语发生了多少匹配(如果可能的话?)
我考虑过聚合器,但认为这些不符合要求,因为这些会给我整个索引的匹配数量而不是每个文档。
感谢。
答案 0 :(得分:5)
这可以通过使用Script Fields / painless
脚本来实现。
您可以计算每个字段的出现次数,并将其加到文档中。
示例:
## Here's my test index with some sample values
POST t1/doc/1 <-- this has one occurence
{
"content" : "my test phrase"
}
POST t1/doc/2 <-- this document has 5 occurences
{
"content": "my test phrase ",
"content1" : "this is my test phrase 1",
"content2" : "this is my test phrase 2",
"content3" : "this is my test phrase 3",
"content4" : "this is my test phrase 4"
}
POST t1/doc/3
{
"content" : "my test new phrase"
}
现在,使用脚本,我可以计算每个字段的词组匹配。我每个字段都在统计一次,但是您可以将脚本修改为每个字段多个匹配。
很明显,这里的缺点是您需要在脚本中提及文档中的每个字段,除非有一种我不知道的遍历doc字段的方法。
POST t1/_search
{
"script_fields": {
"phrase_Count": {
"script": {
"lang": "painless",
"source": """
int count = 0;
if(doc['content.keyword'].size() > 0 && doc['content.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content1.keyword'].size() > 0 && doc['content1.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content2.keyword'].size() > 0 && doc['content2.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content3.keyword'].size() > 0 && doc['content3.keyword'].value.indexOf(params.phrase)!=-1) count++;
if(doc['content4.keyword'].size() > 0 && doc['content4.keyword'].value.indexOf(params.phrase)!=-1) count++;
return count;
""",
"params": {
"phrase": "my test phrase"
}
}
}
}
}
这将使我将每个文档的短语计数作为脚本字段
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "t1",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
5 <--- count of occurrences of the phrase in the document
]
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
1
]
}
},
{
"_index" : "t1",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"fields" : {
"phrase_Count" : [
0
]
}
}
]
}
}
答案 1 :(得分:-1)
您可以使用术语向量来实现此功能。请看一看 Term Vectors