查找弹性搜索中我们连续一个单词的次数

时间:2016-04-29 13:48:06

标签: sorting elasticsearch

我正在研究弹性搜索,并面临一个问题,即在搜索记录时,一个单词连续出现多少次。 就像我有以下行:

{
{ "user":"Aniket", "postDate":"2016-04-26","body":"Search as we discuss yesterday one time word", "title":"One time word"}
}, 
{
"user": "aniket", "postDate": "2016-04-26", "body": "Distribution is hard. Distribution should be easy.word word word word" , "title": "Four times word"}
}, 
{"user": "aniket", "postDate": "2016-04-26", "body": "Distribution is hard. Distribution should be easy.word word word" , "title": "Three times word"}
}, 
{"user": "aniket", 
"postDate": "2016-04-26", 
"body": "Distribution is hard. Distribution should be easy.word word" ,
"title": "Two times word"
}

我在用户aniket下面有四行,每行都有“word”,但有时会有两次,三次,四次或一次。我需要结果,如果我搜索“word”,我们在结果中发现了四次,而不是它会在顶部出现:1。单词单词单词2.单词单词3.单词单词4.单词我尝试过分数也是如此,但分数不会向我提供与此相关的任何信息。

我正在尝试以下查询

curl -XGET 'localhost:9200/blog/post/_search?pretty=1' -d '{
"query": {
"match": {
"body": "word"
}
},
"sort": {
"_script": {
"type": "number",
"script": "termInfo=_index['body'][term].tf();return termInfo;",
"params": {
"term": "word"
},
"lang": "groovy",
"order": "desc"
}
}
}'

并收到此错误:

{
"index" : "blog",
"shard" : 4,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[blog][4]: query[filtered(body:word)->cache(type:post)],from[0],size[10],sort[script\": org.elasticsearch.index.fielddata.fieldcomparator.DoubleScriptDataComparator$InnerSource@51c07776>!]: Query Failed [Failed to execute main query]]; nested: GroovyScriptExecutionException[MissingPropertyException[No such property: body for class: Script5]]; "
} ]

如果我删除查询的排序部分而不是给我结果,即使我使用简单的排序然后使用asc的主体和顺序也比它正常工作但不是我们的单词计数情况。任何解决方案和我缺少的东西?

1 个答案:

答案 0 :(得分:0)

请按照我在sort result by term frequency count上的说明检查" _explanation "的内容:

 "_explanation": {
          "value": 0.16608895,
          "description": "weight(_all:godfather in 0) [PerFieldSimilarity], result of:",
          "details": [
            {
              "value": 0.16608895,
              "description": "fieldWeight in 0, product of:",
              "details": [
                {
                  "value": 1.7320508,
                  "description": "tf(freq=3.0), with freq of:",
                  "details": [
                    {
                      "value": 3,
                      "description": "termFreq=3.0",
                      "details": []
                    }
                  ]
                },
                {
                  "value": 0.30685282,
                  "description": "idf(docFreq=1, maxDocs=1)",
                  "details": []
                },
                {
                  "value": 0.3125,
                  "description": "fieldNorm(doc=0)",
                  "details": []
                }
              ]
            }
          ]
        }

有这个词的计数:

"value": 3,
"description": "termFreq=3.0",
"details": []