搜索单个字段的重复文档正在运行。 索引是test_4。类型是test_4。该字段是日期。
curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"field": "date.keyword",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}'
搜索多个字段的重复文档无法正常工作。索引是test_4。类型是test_4。字段是日期和事件类型。
curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "doc['"'"'date'"'"'].values + doc['"'"'EventType'"'"'].values",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}'
这是错误。
curl: (52) Empty reply from server
搜索多个字段的重复文档无法正常工作。索引是test_4。类型是test_4。字段是日期和事件类型。
curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "def l = []; l.addAll(doc['date']); l.addAll(doc['EventType'].values); l",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}'
这是错误。
curl: (52) Empty reply from server
搜索多个字段的重复文档无法正常工作。索引是test_4。类型是test_4。字段是日期和事件类型。
curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "def l = []; l.addAll(doc['"'"'date'"'"']); l.addAll(doc['"'"'EventType'"'"'].values); l",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}'
这是错误。
curl: (52) Empty reply from server
搜索多个字段的重复文档无法正常工作。 索引是test_4。类型是test_4。字段是日期和事件类型。
curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "doc['date'].values + doc['EventType'].values",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}'
这是错误。 错误原因是"变量[日期]未定义"。
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "compile error",
"script_stack" : [
"doc[date].values + doc[EventT ...",
" ^---- HERE"
],
"script" : "doc[date].values + doc[EventType].values",
"lang" : "painless"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "test_4",
"node" : "dhB-H0_yRROhoP6W-FhOyA",
"reason" : {
"type" : "script_exception",
"reason" : "compile error",
"script_stack" : [
"doc[date].values + doc[EventT ...",
" ^---- HERE"
],
"script" : "doc[date].values + doc[EventType].values",
"lang" : "painless",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Variable [date] is not defined."
}
}
}
]
},
"status" : 500
}
这是一个示例文档。
{
"_index" : "test_4",
"_type" : "test_4",
"_id" : "IMQcWGEBOC31Kjf9gyWS",
"_score" : 18.249443,
"_source" : {
"date" : "18-02-02",
"path" : "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app_2018-02-02_11.0.log",
"@timestamp" : "2018-02-02T20:01:59.159Z",
"EventType" : "ERROR",
"EventText" : "[Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"@version" : "1",
"host" : "hostname",
"time" : "11:31:36,978",
"message" : "2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
"type" : "test_4"
}
},
答案 0 :(得分:0)
问题是您的解释器(很可能是bash)正在从查询中删除'
。事实上,ES没有收到它:
"script_stack" : [
"doc[date].values + doc[EventT ...",
" ^---- HERE"
],
"script" : "doc[date].values + doc[EventType].values",
如果您尝试echo
命令,则可以看到所需的'
已删除:
$ echo curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
...
"script": "doc['date'].values + doc['EventType'].values",
...
}'
curl -XGET http://ip:9200/test_4/test_4/_search?pretty=true -H Content-Type: application/json -d{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "doc[date].values + doc[EventType].values",
"min_doc_count": 2
},
...
您应该使用类似'
的语法来转义'"'"'
字符。在这里,我们关闭第一个'
,使用"
开始新字符串,放置'
,然后关闭"
,最后打开另一个'
。以下是它的外观:
$ echo curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "doc['"'"'date'"'"'].values + doc['"'"'EventType'"'"'].values",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}'
curl -XGET http://ip:9200/test_4/test_4/_search?pretty=true -H Content-Type: application/json -d{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"script": "doc['date'].values + doc['EventType'].values",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}
例如,您可以在this SO question中获得有关bash引用的更多信息。
我了解您希望使用来自两个字段的值的terms
聚合查找重复项。您尝试创建的脚本查询无法正常工作,因为fielddata
数组是不可变的。这是您可能使用的脚本:
"script": "def l = []; l.addAll(doc['date']); l.addAll(doc['EventType'].values); l",
"min_doc_count": 2
},
您也可以考虑使用copy_to
将多个字段中的值复制到一个字段中,然后在一个字段上进行常规terms
聚合(这应该比脚本聚合更快)。
希望有所帮助!