我的样本弹性索引文档,
{ "_index": "testdata", "_type": "tweet", "_id": > "Dbo5qmMBSUBLqBARJmBG", "_version": 1, "_score": 1, "_source": { > "fileName": "alibaba.pdf", "chapter": "chapter1", "page": 1, > "timeDate": "2018-05-24T11:06:48+00:00", "text": "So why do we > need machine learning, why do we want a machine to learn as a human? > There are many problems involving huge datasets, or complex > calculations for instance, where it makes sense to let computers do > all the work. In general, of course, computers and robots dont get > tired, dont have to sleep, and may be cheaper. There is also an > emerging school of thought called active learning or > human-in-the-loop, which advocates combining the efforts of machine > learners and humans. The idea is that there are routine boring tasks > more suitable for computers, and creative tasks more suitable for > humans.According to this philosophy, machines are able to learn, by > following rules or algorithms designed by humans and to do repetitive
and logic tasks desired by a human" } }
那么我如何计算像...... machine = 3
,humans =2
等字样。
答案 0 :(得分:0)
您可能想要运行聚合查询。它会是这样的:
{
"aggs" : {
"text_aggregation" : {
"terms" : {
"field" : "text"
}
}
}
}
这适用于整个索引。如果您想在特定文档上运行它,可以使用以下内容:
{
"query":{
"match":{
"eventCodes":"ET00075293"
}
},
"aggs" : {
"text_aggregation" : {
"terms" : {
"fileName" : "alibaba.pdf"
}
}
}
}
答案 1 :(得分:0)
@Satej S注意!第一个聚合不起作用,因为terms aggregation
仅适用于关键字数据类型。另外第二个聚合不起作用,语法错误!
@Prashant Patel您可以使用termvector - doc here - 将产品字数统计在集合中的所有文档中。更简单的方法是更改映射并创建字段文本的副本,但使用允许术语聚合的关键字数据类型进行索引 - doc here然后尝试像这样的聚合:
{
"query":{
"term":{
"_id":"Dbo5qmMBSUBLqBARJmBG"
}
},
"aggs" : {
"text_aggregation" : {
"terms" : {
"field" : "text.keyword"
}
}
}
}