我有一个简单的csv
文件,有4个字段,serial_num,post_code,lat,lon,如:
serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983
我需要批量插入elasticsearch
。 lat lon字段需要在单个geo_point字段中定义,因此我创建了一个映射,如下所示:
type是widget
PUT /serial_data
{
"mappings": {
"widget": {
"properties": {
"serial_number": {
"type": "string"
},
"post_code": {
"type": "string"
},
"location": {
"type": "geo_point"
}
}
}
} }
我试图使用embulk
来插入数据,因为我认为我已经定义了映射。如果我将lat定义为double或long,那么embulk
将解析lat,长到单个位置,它没有,我过于乐观。
我还认为embulk
有一个批量输入-json插件,但我无法找到它。
问题
如果批量加载这些数据,我们将非常感激。
答案 0 :(得分:0)
我使用树过滤插件。
data.csv
serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983
conf.yml
in:
type: file
path_prefix: data.csv
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
escape: '"'
trim_if_not_quoted: false
skip_header_lines: 1
allow_extra_columns: false
allow_optional_columns: false
columns:
- {name: serial_num, type: string}
- {name: post_code, type: string}
- {name: lat, type: long}
- {name: lon, type: long}
filters:
- type: insert
column:
location:
- type: ruby_proc
requires:
- json
columns:
- name: location
proc: |
->(_,record) do
return { lat: record["lat"], lon: record["lon"] }.to_json.to_s
end
skip_nil: false
- type: column
columns:
- {name: serial_num}
- {name: post_code}
- {name: location}
out: {type: stdout}
输出
+-------------------+------------------+-----------------------------+
| serial_num:string | post_code:string | location:string |
+-------------------+------------------+-----------------------------+
| 06AA209365 | PE10 2AZ | {"lat":532342,"lon":168459} |
| 98A819621 | PE10 1AA | {"lat":532342,"lon":168459} |
| 07FD490906 | PE12 1VV | {"lat":497882,"lon":157983} |
+-------------------+------------------+-----------------------------+