我正在尝试向Elasticsearch添加一个json文件,该文件大约有30.000行,并且其格式不正确。我正在尝试通过Bulk API上传它,但找不到有效格式化它的方法。我正在使用Ubuntu 16.04LTS。
这是json的格式:
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}
我确实知道Bulk API的格式需要{"index":{"_id":*}}
在文件中每个看起来像这样的json对象之前:
{"index":{"_id":1}}
{
"rt": "2018-11-20T12:57:32.292Z",
"source_info": { "ip": "0.0.60.50" },
"end": "2018-11-20T12:57:32.284Z",
"severity": "low",
"duid": "5b8d0a48ba59941314e8a97f",
"dhost": "004678",
"endpoint_type": "computer",
"endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
"suser": "Katerina",
"group": "PERIPHERALS",
"customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
"type": "Event::Endpoint::Device::AlertedOnly",
"id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
"name": "Peripheral allowed: Samsung Galaxy S7 edge"
}
如果我手动插入索引ID,然后使用此表达式curl -s -H“ Content-Type:application/x-ndjson" -XPOST localhost:92100/ivc/default/bulk?pretty --data-binary @results.json
,它将没有错误地上传它。
我的问题是,如何将索引ID {"index":{"_id":*}}
添加到json的每一行以使其准备上载?显然,索引ID必须在每行上添加+1,是否可以通过CLI来实现?
很抱歉,如果这篇文章看起来不正确,我在Stack Overflow中阅读了数百万篇文章,但这是我的第一篇文章! #绝望
非常感谢您!
答案 0 :(得分:0)
您的问题是Elasticsearch希望文档在一行上是有效的json,例如:
{"index":{"_id":1}}
{"rt":"2018-11-20T12:57:32.292Z","source_info":{"ip":"0.0.60.50"},"end":"2018-11-20T12:57:32.284Z","severity":"low","duid":"5b8d0a48ba59941314e8a97f","dhost":"004678","endpoint_type":"computer","endpoint_id":"8e7e2806-eaee-9436-6ab5-078361576290","suser":"Katerina","group":"PERIPHERALS","customer_id":"a263f4c8-942f-d4f4-5938-7c37013c03be","type":"Event::Endpoint::Device::AlertedOnly","id":"83d63d48-f040-2485-49b9-b4ff2ac4fad4","name":"Peripheral allowed: Samsung Galaxy S7 edge"}
您必须找到一种转换输入文件的方法,以便每行有一个文档,然后采用Val的解决方案就可以了。
答案 1 :(得分:0)
感谢您提供所有答案,它们确实帮助我朝正确的方向前进。
我制作了一个bash脚本来自动化日志的下载,格式化和上载到Elasticsearch:
#!/bin/bash
echo "Downloading logs from Sophos Central. Please wait."
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log
#This deletes the last batch of results
rm result.json
cd ..
#This triggers the script to download a new batch of logs from Sophos
./siem.py
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log
#Adds newline at the beginning of the logs file
sed -i '1 i\{"index":{}}' result.json
#Adds indexes
sed -i '3~2s/^/{"index":{}}/' result.json
#Adds json file to elasticsearch
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/ivc/default/_bulk?pretty --data-binary @result.json
这就是我实现这一目标的方式。可能会有更简单的选择,但是这个对我有用。希望对其他人有用!
再次感谢大家! :D