Question

我的样本弹性索引文档，

{ "_index": "testdata", "_type": "tweet", "_id": > "Dbo5qmMBSUBLqBARJmBG", "_version": 1, "_score": 1, "_source": { > "fileName": "alibaba.pdf", "chapter": "chapter1", "page": 1, > "timeDate": "2018-05-24T11:06:48+00:00", "text": "So why do we > need machine learning, why do we want a machine to learn as a human? > There are many problems involving huge datasets, or complex > calculations for instance, where it makes sense to let computers do > all the work. In general, of course, computers and robots dont get > tired, dont have to sleep, and may be cheaper. There is also an > emerging school of thought called active learning or > human-in-the-loop, which advocates combining the efforts of machine > learners and humans. The idea is that there are routine boring tasks > more suitable for computers, and creative tasks more suitable for > humans.According to this philosophy, machines are able to learn, by > following rules or algorithms designed by humans and to do repetitive
 and logic tasks desired by a human" } }

那么我如何计算像...... machine = 3，humans =2等字样。

Answer 1

您可能想要运行聚合查询。它会是这样的：

{
    "aggs" : {
        "text_aggregation" : {
            "terms" : { 
                "field" : "text" 
            }
        }
    }
}

这适用于整个索引。如果您想在特定文档上运行它，可以使用以下内容：

{
    "query":{
        "match":{
            "eventCodes":"ET00075293"
        }   
    },
    "aggs" : {
        "text_aggregation" : {
            "terms" : { 
                "fileName" : "alibaba.pdf" 
            }
        }
    }
}

Answer 2

@Satej S注意！第一个聚合不起作用，因为terms aggregation仅适用于关键字数据类型。另外第二个聚合不起作用，语法错误！

@Prashant Patel您可以使用termvector - doc here - 将产品字数统计在集合中的所有文档中。更简单的方法是更改映射并创建字段文本的副本，但使用允许术语聚合的关键字数据类型进行索引 - doc here然后尝试像这样的聚合：

{
    "query":{
        "term":{
            "_id":"Dbo5qmMBSUBLqBARJmBG"
        }   
    },
    "aggs" : {
        "text_aggregation" : {
            "terms" : { 
                "field" : "text.keyword" 
            }
        }
    }
}

弹性搜索词数

2 个答案: