Question

我尝试将大型JSON文档导入Elasticsearch 5.1。一小部分数据如下所示：

[
    {
      "id": 1,
      "region": "ca-central-1",
      "eventName": "CreateRole",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "email@group.com"
    },
    {
      "id": 2,
      "region": "ca-central-1",
      "eventName": "AddRoleToInstanceProfile",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "email@group.com"
    },
    {
      "id": 3,
      "region": "ca-central-1",
      "eventName": "CreateInstanceProfile",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "email@group.com"
    },
    {
      "id": 4,
      "region": "ca-central-1",
      "eventName": "AttachGroupPolicy",
      "eventTime": "2016-02-04T01:42:36.000Z",
      "userName": "email@group.com"
    },
    {
      "id": 5,
      "region": "ca-central-1",
      "eventName": "AttachGroupPolicy",
      "eventTime": "2016-02-04T01:39:20.000Z",
      "userName": "email@group.com"
    }
]

如果可能的话，我想导入数据而不对源数据进行任何更改，因此我认为排除了_bulk命令，因为我需要为每个条目添加其他详细信息。

我尝试了几种不同的方法，但没有运气。我是否在浪费时间尝试按原样导入此文档？

我试过了：

curl -XPOST 'demo.ap-southeast-2.es.amazonaws.com/rea/test' --data-binary @Records.json

但是失败并出现错误：

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}},"status":400}

谢谢！

Answer 1

如果您不想修改文件，批量api将无效。

您可以查看jq。它是一个命令行json解析器。它可能有助于您生成运行批量API所需的文档。

cat Records.json | 
jq -c '
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. '

您可以尝试这样的方法并将其传递给批量API。希望这会有所帮助。

您也可以尝试进行类似的卷曲调用。

cat Records.json | 
jq -
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. ' | curl -XPOST demo.ap-southeast-2.es.amazonaws.com/_bulk --data-binary @-

没有尝试过第二部分，但应该有效。

Answer 2

您可能需要查看stream2es - 它是一个将文档发送到ElasticSearch的有用工具。我认为它可能会做你需要做的事情。

安装完成后，你应该能够使用它：

cat Records.json | ./stream2es stdin --target 'http://demo.ap-southeast-2.es.amazonaws.com/rea/test'

使用CURL将JSON导入Elasticsearch 5.1

2 个答案: