弹性搜索转换并将lat lon作为geo_point批量插入

时间:2016-06-13 10:24:14

标签: elasticsearch elasticsearch-plugin

我有一个简单的csv文件,有4个字段,serial_num,post_code,lat,lon,如:

serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983

我需要批量插入elasticsearch。 lat lon字段需要在单个geo_point字段中定义,因此我创建了一个映射,如下所示:

  • index是serial_data
  • type是widget

    PUT /serial_data
    {
    "mappings": {
    "widget": {
      "properties": {
        "serial_number": {
          "type": "string"
        },
        "post_code": {
          "type": "string"
        },
        "location": {
          "type": "geo_point"
        }
      }
    }
    

    } }

我试图使用embulk来插入数据,因为我认为我已经定义了映射。如果我将lat定义为double或long,那么embulk将解析lat,长到单个位置,它没有,我过于乐观。

我还认为embulk有一个批量输入-json插件,但我无法找到它。

问题

如果批量加载这些数据,我们将非常感激。

1 个答案:

答案 0 :(得分:0)

我使用树过滤插件。

  • embulk-filter-insert:插入位置列
  • embulk-filter-ruby_proc:结合LAT和LON列
  • embulk-filter-column:删除LAT和LON列

data.csv

serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983

conf.yml

in:
  type: file
  path_prefix: data.csv
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: serial_num, type: string}
    - {name: post_code, type: string}
    - {name: lat, type: long}
    - {name: lon, type: long}
filters:
  - type: insert
    column: 
      location: 
  - type: ruby_proc
    requires:
      - json
    columns:
      - name: location
        proc: |
          ->(_,record) do 
            return { lat: record["lat"], lon: record["lon"] }.to_json.to_s
          end
        skip_nil: false

  - type: column
    columns:
      - {name: serial_num}
      - {name: post_code}
      - {name: location}


out: {type: stdout}

输出

+-------------------+------------------+-----------------------------+
| serial_num:string | post_code:string |             location:string |
+-------------------+------------------+-----------------------------+
|        06AA209365 |         PE10 2AZ | {"lat":532342,"lon":168459} |
|         98A819621 |         PE10 1AA | {"lat":532342,"lon":168459} |
|        07FD490906 |         PE12 1VV | {"lat":497882,"lon":157983} |
+-------------------+------------------+-----------------------------+