Elasticsearch批量索引JSON数据

时间:2015-10-26 07:04:56

标签: json elasticsearch

我正在尝试将JSON文件批量索引到新的Elasticsearch索引中,但我无法这样做。我在JSON中有以下示例数据

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

我正在使用

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json 

当我尝试使用Elasticsearch的标准批量索引API时,我收到此错误

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

任何人都可以帮助索引这种类型的JSON吗?

3 个答案:

答案 0 :(得分:30)

您需要做的是读取该JSON文件,然后使用_bulk endpoint期望的格式构建批量请求,即命令的一行和文档的一行,由换行符分隔...为每份文件冲洗并重复:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

只需确保将your_indexyour_type替换为您正在使用的实际索引和类型名称。

更新

请注意,如果在您的网址中指定了_index_type,则可以缩短命令行。如果在映射中指定path to your id field,也可以删除_id(请注意,此功能将在ES 2.0中弃用)。至少,您的命令行对于所有文档看起来都像{"index":{}},但是为了指定您要执行的操作类型(在这种情况下index文档)< / p>

更新2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.json应如下所示:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

答案 1 :(得分:3)

截至今天,6.1.2是ElasticSearch的最新版本,在Windows(x64)上适用于我的curl命令是

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json

mydata.json中应该出现的数据格式与@ val的答案中显示的格式相同

答案 2 :(得分:0)

有效的 Elasticsearch批量API 请求类似于(以换行符结尾):

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"} 
{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"} 

Elasticsearch批量api文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

这就是我的做法

我发送了一个POST http请求,其中uri可以作为http请求的URI / URL,而elasticsearchJson变量是在HTTP请求的主体中发送的JSON,其格式为Elasticsearch批量api :

var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");

用于为Elasticsearch批量api生成所需json格式的Helper方法:

public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
  return @"{ ""index"":{ } } 
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @" 
{ ""index"":{ } } 
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}

我的JSON对象中的第一个属性/字段是RequestedCountry属性,这就是我在此示例中使用它的原因。

productModel是我的Elasticsearch文档类型。 sqlResult是带有产品的C#通用列表。