我试图以geojson,csv文件和形状文件的形式将空间数据存储到elasticsearch中使用PYTHON。我是elasticsearch的新手,甚至在遵循文档之后我无法成功索引它。任何帮助将不胜感激。
示例geojson文件:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"ID_0": 105,
"ISO": "IND",
"NAME_0": "India",
"ID_1": 1288,
"NAME_1": "Telangana",
"ID_2": 15715,
"NAME_2": "Telangana",
"VARNAME_2": null,
"NL_NAME_2": null,
"HASC_2": "IN.TS.AD",
"CC_2": null,
"TYPE_2": "State",
"ENGTYPE_2": "State",
"VALIDFR_2": "Unknown",
"VALIDTO_2": "Present",
"REMARKS_2": null,
"Shape_Leng": 8.103535,
"Shape_Area": 127258717496
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
79.14429367552918,
19.500257885106404
],
[
79.14582245808431,
19.498859172536427
],
[
79.14600496956801,
19.498823981691853
],
[
79.14966523737327,
19.495821705263914
]
]
]
}
}
]
}
答案 0 :(得分:1)
import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
k = ({
"_index": "YOUR_INDEX",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
这部分代码将加载一个外部geojson文件,然后连接到Elasticsearch。
k = ({
"_index": "conflict-data",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
这里的()
创建了一个生成器,我们将把它生成给helpers.bulk(es, k)
。记住_source
是原始数据,就像Elasticsearch所说的那样-IE:我们的原始JSON。 _index
只是我们要将数据放入其中的索引。您将在此处看到其他带有_doc
的示例。这是映射类型的一部分,在Elasticsearch 7.X +中不再存在。
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
函数geojson
使用生成器来生成事件。每次调用后,生成器函数将不再返回并完成resume at the keyword
yield`。在这种情况下,我们将生成GeoJSON功能。在我的代码中,您还会看到:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
这只是在将数据发送到Elasticsearch之前处理JSON中的数据的示例。
密钥在映射文件中,您必须将其标记为geo_point
或geo_shape
。这些数据类型是Elasticsearch识别地理数据的方式。我的映射文件中的示例:
...
{
"properties": {
"geometry": {
"properties": {
"coordinates": {
"type": "geo_point"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
...
也就是说,在使用Python上传您的GeoJSON数据之前,您需要创建索引,然后应用包含以下内容的映射文件,其中包含geo_shape
或geo_point
,例如:
curl -X PUT "localhost:9200/YOUR_INDEX?pretty"
curl -X PUT localhost:9200/YOUR_INDEX/_mapping?pretty -H "Content-Type: application/json" -d @mapping.json
答案 1 :(得分:0)
您必须将GeoJson要素分为(1)几何和(2)属性/属性部分。您不能直接索引GeoJson要素和要素集(see documentation),只支持几何部分作为字段类型。
因此,您最终的可索引文档看起来有点扁平化:
{
"ID_0": 105,
"ISO": "IND",
"NAME_0": "India",
"ID_1": 1288,
"NAME_1": "Telangana",
"ID_2": 15715,
"NAME_2": "Telangana",
"VARNAME_2": null,
"NL_NAME_2": null,
"HASC_2": "IN.TS.AD",
"CC_2": null,
"TYPE_2": "State",
"ENGTYPE_2": "State",
"VALIDFR_2": "Unknown",
"VALIDTO_2": "Present",
"REMARKS_2": null,
"Shape_Leng": 8.103535,
"Shape_Area": 127258717496,
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
79.14429367552918,
19.500257885106404
],
[
79.14582245808431,
19.498859172536427
],
[
79.14600496956801,
19.498823981691853
],
[
79.14966523737327,
19.495821705263914
]
]
]
}
}