Question

我有一个简单的csv文件，有4个字段，serial_num，post_code，lat，lon，如：

serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983

我需要批量插入elasticsearch。 lat lon字段需要在单个geo_point字段中定义，因此我创建了一个映射，如下所示：

index是serial_data

type是widget

PUT /serial_data
{
"mappings": {
"widget": {
  "properties": {
    "serial_number": {
      "type": "string"
    },
    "post_code": {
      "type": "string"
    },
    "location": {
      "type": "geo_point"
    }
  }
}

} }

我试图使用embulk来插入数据，因为我认为我已经定义了映射。如果我将lat定义为double或long，那么embulk将解析lat，长到单个位置，它没有，我过于乐观。

我还认为embulk有一个批量输入-json插件，但我无法找到它。

问题

如果批量加载这些数据，我们将非常感激。

Answer 1

我使用树过滤插件。

embulk-filter-insert：插入位置列
embulk-filter-ruby_proc：结合LAT和LON列
embulk-filter-column：删除LAT和LON列

data.csv

serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983

conf.yml

in:
  type: file
  path_prefix: data.csv
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: serial_num, type: string}
    - {name: post_code, type: string}
    - {name: lat, type: long}
    - {name: lon, type: long}
filters:
  - type: insert
    column: 
      location: 
  - type: ruby_proc
    requires:
      - json
    columns:
      - name: location
        proc: |
          ->(_,record) do 
            return { lat: record["lat"], lon: record["lon"] }.to_json.to_s
          end
        skip_nil: false

  - type: column
    columns:
      - {name: serial_num}
      - {name: post_code}
      - {name: location}


out: {type: stdout}

输出

+-------------------+------------------+-----------------------------+
| serial_num:string | post_code:string |             location:string |
+-------------------+------------------+-----------------------------+
|        06AA209365 |         PE10 2AZ | {"lat":532342,"lon":168459} |
|         98A819621 |         PE10 1AA | {"lat":532342,"lon":168459} |
|        07FD490906 |         PE12 1VV | {"lat":497882,"lon":157983} |
+-------------------+------------------+-----------------------------+

弹性搜索转换并将lat lon作为geo_point批量插入

1 个答案: