我尝试将大型JSON文档导入Elasticsearch 5.1。一小部分数据如下所示:
[
{
"id": 1,
"region": "ca-central-1",
"eventName": "CreateRole",
"eventTime": "2016-02-04T03:41:19.000Z",
"userName": "email@group.com"
},
{
"id": 2,
"region": "ca-central-1",
"eventName": "AddRoleToInstanceProfile",
"eventTime": "2016-02-04T03:41:19.000Z",
"userName": "email@group.com"
},
{
"id": 3,
"region": "ca-central-1",
"eventName": "CreateInstanceProfile",
"eventTime": "2016-02-04T03:41:19.000Z",
"userName": "email@group.com"
},
{
"id": 4,
"region": "ca-central-1",
"eventName": "AttachGroupPolicy",
"eventTime": "2016-02-04T01:42:36.000Z",
"userName": "email@group.com"
},
{
"id": 5,
"region": "ca-central-1",
"eventName": "AttachGroupPolicy",
"eventTime": "2016-02-04T01:39:20.000Z",
"userName": "email@group.com"
}
]
如果可能的话,我想导入数据而不对源数据进行任何更改,因此我认为排除了_bulk命令,因为我需要为每个条目添加其他详细信息。
我尝试了几种不同的方法,但没有运气。我是否在浪费时间尝试按原样导入此文档?
我试过了:
curl -XPOST 'demo.ap-southeast-2.es.amazonaws.com/rea/test' --data-binary @Records.json
但是失败并出现错误:
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}},"status":400}
谢谢!
答案 0 :(得分:1)
如果您不想修改文件,批量api将无效。
您可以查看jq。 它是一个命令行json解析器。它可能有助于您生成运行批量API所需的文档。
cat Records.json |
jq -c '
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. '
您可以尝试这样的方法并将其传递给批量API。希望这会有所帮助。
您也可以尝试进行类似的卷曲调用。
cat Records.json |
jq -
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. ' | curl -XPOST demo.ap-southeast-2.es.amazonaws.com/_bulk --data-binary @-
没有尝试过第二部分,但应该有效。
答案 1 :(得分:0)
您可能需要查看stream2es - 它是一个将文档发送到ElasticSearch的有用工具。我认为它可能会做你需要做的事情。
安装完成后,你应该能够使用它:
cat Records.json | ./stream2es stdin --target 'http://demo.ap-southeast-2.es.amazonaws.com/rea/test'