我有一个JSON文件,我需要在ElasticSearch服务器上对其进行索引。
JSOIN文件如下所示:
{
"sku": "1",
"vbid": "1",
"created": "Sun, 05 Oct 2014 03:35:58 +0000",
"updated": "Sun, 06 Mar 2016 12:44:48 +0000",
"type": "Single",
"downloadable-duration": "perpetual",
"online-duration": "365 days",
"book-format": "ePub",
"build-status": "In Inventory",
"description": "On 7 August 1914, a week before the Battle of Tannenburg and two weeks before the Battle of the Marne, the French army attacked the Germans at Mulhouse in Alsace. Their objective was to recapture territory which had been lost after the Franco-Prussian War of 1870-71, which made it a matter of pride for the French. However, after initial success in capturing Mulhouse, the Germans were able to reinforce more quickly, and drove them back within three days. After forty-three years of peace, this was the first test of strength between France and Germany. In 1929 Karl Deuringer wrote the official history of the battle for the Bavarian Army, an immensely detailed work of 890 pages; First World War expert and former army officer Terence Zuber has translated this study and edited it down to more accessible length, to produce the first account in English of the first major battle of the First World War.",
"publication-date": "07/2014",
"author": "Deuringer, Karl",
"title": "The First Battle of the First World War: Alsace-Lorraine",
"sort-title": "First Battle of the First World War: Alsace-Lorraine",
"edition": "0",
"sampleable": "false",
"page-count": "0",
"print-drm-text": "This title will only allow printing of 2 consecutive pages at a time.",
"copy-drm-text": "This title will only allow copying of 2 consecutive pages at a time.",
"kind": "book",
"fro": "false",
"distributable": "true",
"subjects": {
"subject": [
{
"-schema": "bisac",
"-code": "HIS027090",
"#text": "World War I"
},
{
"-schema": "coursesmart",
"-code": "cs.soc_sci.hist.milit_hist",
"#text": "Social Sciences -> History -> Military History"
}
]
},
"pricelist": {
"publisher-list-price": "0.0",
"digital-list-price": "7.28"
},
"publisher": {
"publisher-name": "The History Press",
"imprint-name": "The History Press Ireland"
},
"aliases": {
"eisbn-canonical": "1",
"isbn-canonical": "1",
"print-isbn-canonical": "9780752460864",
"isbn13": "1",
"isbn10": "0750951796",
"additional-isbns": {
"isbn": [
{
"-type": "print-isbn-10",
"#text": "0752460862"
},
{
"-type": "print-isbn-13",
"#text": "97807524608"
}
]
}
},
"owner": {
"company": {
"id": "1893",
"name": "The History Press"
}
},
"distributor": {
"company": {
"id": "3658",
"name": "asc"
}
}
}
但是当我尝试使用命令
索引此JSON文件时curl -XPOST 'http://localhost:9200/_bulk' -d @1.json
我收到此错误:
{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"},"status":400}
我不知道我在哪里弄错了。
答案 0 :(得分:28)
Elasticsearch的批量API使用特殊语法,该语法实际上由单行写入的json
文档组成。看看documentation。
语法非常简单。对于索引,创建和更新,您需要2个单行json文档。第一行告诉操作,第二行给文档索引/创建/更新。要删除文档,只需要操作行。例如(来自文档):
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
不要忘记用新行结束文件。 然后,要调用批量api,请使用以下命令:
curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"
来自文档:
如果您要为curl提供文本文件输入,则必须使用
--data-binary
标志而不是普通-d
答案 1 :(得分:0)
我遇到了类似的问题,因为我想删除特定类型的特定文档,通过上面的回答,我设法让我的简单bash脚本终于工作了!
我有一个文件,每行有一个document_id(document_id.txt),使用下面的bash脚本,我可以用上面提到的document_id删除某种类型的文件。
这就是文件的样子:
c476ce18803d7ed3708f6340fdfa34525b20ee90
5131a30a6316f221fe420d2d3c0017a76643bccd
08ebca52025ad1c81581a018febbe57b1e3ca3cd
496ff829c736aa311e2e749cec0df49b5a37f796
87c4101cb10d3404028f83af1ce470a58744b75c
37f0daf7be27cf081e491dd445558719e4dedba1
bash脚本如下所示:
#!/bin/bash
es_cluster="http://localhost:9200"
index="some-index"
doc_type="some-document-type"
for doc_id in `cat document_id.txt`
do
request_string="{\"delete\" : { \"_type\" : \"${doc_type}\", \"_id\" : \"${doc_id}\" } }"
echo -e "${request_string}\r\n\r\n" | curl -s -XPOST "${es_cluster}/${index}/${doc_type}/_bulk" --data-binary @-
echo
done
经过多次挫折后,诀窍就是使用 -e 选项回显并将 \ n \ n 附加到echo的输出中,然后再将其输入卷曲。
然后在curl中我设置了 - data-binary 选项以阻止它删除 _bulk <所需的 \ n \ n / strong>端点后跟 @ - 选项,让它从stdin中读取!
答案 2 :(得分:0)
就我而言,这是一个奇怪的错误。我正在创建 bulkRequest 对象并在插入 ElasticSearch 之前将其清除。
产生问题的行。
bulkRequest.requests().clear();
答案 3 :(得分:0)
添加下一行(在邮递员的情况下输入或“\n”如果您在客户端 API 中使用 json 作为正文)完成了我的工作