elasticsearch重复文档多字段搜索错误

时间:2018-02-03 04:52:15

标签: elasticsearch

搜索单个字段的重复文档正在运行。 索引是test_4。类型是test_4。该字段是日期。

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "field": "date.keyword",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

搜索多个字段的重复文档无法正常工作。索引是test_4。类型是test_4。字段是日期和事件类型。

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['"'"'date'"'"'].values + doc['"'"'EventType'"'"'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

这是错误。

curl: (52) Empty reply from server

搜索多个字段的重复文档无法正常工作。索引是test_4。类型是test_4。字段是日期和事件类型。

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "def l = []; l.addAll(doc['date']); l.addAll(doc['EventType'].values); l",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

这是错误。

curl: (52) Empty reply from server

搜索多个字段的重复文档无法正常工作。索引是test_4。类型是test_4。字段是日期和事件类型。

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "def l = []; l.addAll(doc['"'"'date'"'"']); l.addAll(doc['"'"'EventType'"'"'].values); l",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

这是错误。

curl: (52) Empty reply from server

搜索多个字段的重复文档无法正常工作。 索引是test_4。类型是test_4。字段是日期和事件类型。

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['date'].values + doc['EventType'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

这是错误。 错误原因是"变量[日期]未定义"。

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "compile error",
        "script_stack" : [
          "doc[date].values + doc[EventT ...",
          "    ^---- HERE"
        ],
        "script" : "doc[date].values + doc[EventType].values",
        "lang" : "painless"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test_4",
        "node" : "dhB-H0_yRROhoP6W-FhOyA",
        "reason" : {
          "type" : "script_exception",
          "reason" : "compile error",
          "script_stack" : [
            "doc[date].values + doc[EventT ...",
            "    ^---- HERE"
          ],
          "script" : "doc[date].values + doc[EventType].values",
          "lang" : "painless",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Variable [date] is not defined."
          }
        }
      }
    ]
  },
  "status" : 500
}

这是一个示例文档。

{
  "_index" : "test_4",
  "_type" : "test_4",
  "_id" : "IMQcWGEBOC31Kjf9gyWS",
  "_score" : 18.249443,
  "_source" : {
    "date" : "18-02-02",
    "path" : "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app_2018-02-02_11.0.log",
    "@timestamp" : "2018-02-02T20:01:59.159Z",
    "EventType" : "ERROR",
    "EventText" : "[Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
    "@version" : "1",
    "host" : "hostname",
    "time" : "11:31:36,978",
    "message" : "2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
    "type" : "test_4"
  }
},

1 个答案:

答案 0 :(得分:0)

问题是您的解释器(很可能是bash)正在从查询中删除'。事实上,ES没有收到它:

      "script_stack" : [
        "doc[date].values + doc[EventT ...",
        "    ^---- HERE"
      ],
      "script" : "doc[date].values + doc[EventType].values",

如果您尝试echo命令,则可以看到所需的'已删除:

$ echo curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  ...
      "script": "doc['date'].values + doc['EventType'].values",
  ...
}'
curl -XGET http://ip:9200/test_4/test_4/_search?pretty=true -H Content-Type: application/json -d{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc[date].values + doc[EventType].values",
        "min_doc_count": 2
      },
      ...

您应该使用类似'的语法来转义'"'"'字符。在这里,我们关闭第一个',使用"开始新字符串,放置',然后关闭",最后打开另一个'。以下是它的外观:

$ echo curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['"'"'date'"'"'].values + doc['"'"'EventType'"'"'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'
curl -XGET http://ip:9200/test_4/test_4/_search?pretty=true -H Content-Type: application/json -d{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['date'].values + doc['EventType'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}

例如,您可以在this SO question中获得有关bash引用的更多信息。

我了解您希望使用来自两个字段的值的terms聚合查找重复项。您尝试创建的脚本查询无法正常工作,因为fielddata数组是不可变的。这是您可能使用的脚本:

"script": "def l = []; l.addAll(doc['date']); l.addAll(doc['EventType'].values); l",
  "min_doc_count": 2
},

您也可以考虑使用copy_to将多个字段中的值复制到一个字段中,然后在一个字段上进行常规terms聚合(这应该比脚本聚合更快)。

希望有所帮助!