弹性搜索词数

时间:2018-05-29 05:12:16

标签: elasticsearch

我的样本弹性索引文档,

{ "_index": "testdata", "_type": "tweet", "_id": > "Dbo5qmMBSUBLqBARJmBG", "_version": 1, "_score": 1, "_source": { > "fileName": "alibaba.pdf", "chapter": "chapter1", "page": 1, > "timeDate": "2018-05-24T11:06:48+00:00", "text": "So why do we > need machine learning, why do we want a machine to learn as a human? > There are many problems involving huge datasets, or complex > calculations for instance, where it makes sense to let computers do > all the work. In general, of course, computers and robots dont get > tired, dont have to sleep, and may be cheaper. There is also an > emerging school of thought called active learning or > human-in-the-loop, which advocates combining the efforts of machine > learners and humans. The idea is that there are routine boring tasks > more suitable for computers, and creative tasks more suitable for > humans.According to this philosophy, machines are able to learn, by > following rules or algorithms designed by humans and to do repetitive
 and logic tasks desired by a human" } }

那么我如何计算像...... machine = 3humans =2等字样。

2 个答案:

答案 0 :(得分:0)

您可能想要运行聚合查询。它会是这样的:

{
    "aggs" : {
        "text_aggregation" : {
            "terms" : { 
                "field" : "text" 
            }
        }
    }
}

这适用于整个索引。如果您想在特定文档上运行它,可以使用以下内容:

{
    "query":{
        "match":{
            "eventCodes":"ET00075293"
        }   
    },
    "aggs" : {
        "text_aggregation" : {
            "terms" : { 
                "fileName" : "alibaba.pdf" 
            }
        }
    }
}

答案 1 :(得分:0)

@Satej S注意!第一个聚合不起作用,因为terms aggregation仅适用于关键字数据类型。另外第二个聚合不起作用,语法错误!

@Prashant Patel您可以使用termvector - doc here - 将产品字数统计在集合中的所有文档中。更简单的方法是更改​​映射并创建字段文本的副本,但使用允许术语聚合的关键字数据类型进行索引 - doc here然后尝试像这样的聚合:

{
    "query":{
        "term":{
            "_id":"Dbo5qmMBSUBLqBARJmBG"
        }   
    },
    "aggs" : {
        "text_aggregation" : {
            "terms" : { 
                "field" : "text.keyword" 
            }
        }
    }
}