如何在elasticsearch中索引geojson文件?

时间:2016-10-19 11:03:55

标签: elasticsearch geojson shapefile

我试图以geojson,csv文件和形状文件的形式将空间数据存储到elasticsearch中使用PYTHON。我是elasticsearch的新手,甚至在遵循文档之后我无法成功索引它。任何帮助将不胜感激。

示例geojson文件:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "ID_0": 105,
        "ISO": "IND",
        "NAME_0": "India",
        "ID_1": 1288,
        "NAME_1": "Telangana",
        "ID_2": 15715,
        "NAME_2": "Telangana",
        "VARNAME_2": null,
        "NL_NAME_2": null,
        "HASC_2": "IN.TS.AD",
        "CC_2": null,
        "TYPE_2": "State",
        "ENGTYPE_2": "State",
        "VALIDFR_2": "Unknown",
        "VALIDTO_2": "Present",
        "REMARKS_2": null,
        "Shape_Leng": 8.103535,
        "Shape_Area": 127258717496
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              79.14429367552918,
              19.500257885106404
            ],
            [
              79.14582245808431,
              19.498859172536427
            ],
            [
              79.14600496956801,
              19.498823981691853
            ],
            [
              79.14966523737327,
              19.495821705263914
            ]
          ]
        ]
      }
    }
  ]
}

2 个答案:

答案 0 :(得分:1)

代码

import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers


def geojson_to_es(gj):

    for feature in gj['features']:

        date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
        feature["properties"]["timestamp"] = int(date.timestamp())
        feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
        yield feature


with open("GeoObs.json") as f:
    gj = geojson.load(f)

    es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])

    k = ({
        "_index": "YOUR_INDEX",
        "_source": feature,
    } for feature in geojson_to_es(gj))

    helpers.bulk(es, k)

说明

with open("GeoObs.json") as f:
    gj = geojson.load(f)

    es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])

这部分代码将加载一个外部geojson文件,然后连接到Elasticsearch。

    k = ({
        "_index": "conflict-data",
        "_source": feature,
    } for feature in geojson_to_es(gj))

    helpers.bulk(es, k)

这里的()创建了一个生成器,我们将把它生成给helpers.bulk(es, k)。记住_source是原始数据,就像Elasticsearch所说的那样-IE:我们的原始JSON。 _index只是我们要将数据放入其中的索引。您将在此处看到其他带有_doc的示例。这是映射类型的一部分,在Elasticsearch 7.X +中不再存在。

def geojson_to_es(gj):

    for feature in gj['features']:

        date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
        feature["properties"]["timestamp"] = int(date.timestamp())
        feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
        yield feature

函数geojson使用生成器来生成事件。每次调用后,生成器函数将不再返回并完成resume at the keyword yield`。在这种情况下,我们将生成GeoJSON功能。在我的代码中,您还会看到:

date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')

这只是在将数据发送到Elasticsearch之前处理JSON中的数据的示例。

密钥在映射文件中,您必须将其标记为geo_pointgeo_shape。这些数据类型是Elasticsearch识别地理数据的方式。我的映射文件中的示例:

...
{
  "properties": {
    "geometry": {
      "properties": {
        "coordinates": {
          "type": "geo_point"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    },
...

也就是说,在使用Python上传您的GeoJSON数据之前,您需要创建索引,然后应用包含以下内容的映射文件,其中包含geo_shapegeo_point,例如:

curl -X PUT "localhost:9200/YOUR_INDEX?pretty" curl -X PUT localhost:9200/YOUR_INDEX/_mapping?pretty -H "Content-Type: application/json" -d @mapping.json

答案 1 :(得分:0)

您必须将GeoJson要素分为(1)几何和(2)属性/属性部分。您不能直接索引GeoJson要素和要素集(see documentation),只支持几何部分作为字段类型。

因此,您最终的可索引文档看起来有点扁平化:

{
    "ID_0": 105,
    "ISO": "IND",
    "NAME_0": "India",
    "ID_1": 1288,
    "NAME_1": "Telangana",
    "ID_2": 15715,
    "NAME_2": "Telangana",
    "VARNAME_2": null,
    "NL_NAME_2": null,
    "HASC_2": "IN.TS.AD",
    "CC_2": null,
    "TYPE_2": "State",
    "ENGTYPE_2": "State",
    "VALIDFR_2": "Unknown",
    "VALIDTO_2": "Present",
    "REMARKS_2": null,
    "Shape_Leng": 8.103535,
    "Shape_Area": 127258717496,
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    79.14429367552918,
                    19.500257885106404
                ],
                [
                    79.14582245808431,
                    19.498859172536427
                ],
                [
                    79.14600496956801,
                    19.498823981691853
                ],
                [
                    79.14966523737327,
                    19.495821705263914
                ]
            ]
        ]
    }
}