如何从PySpark保存Elasticsearch中的geo_point?

时间:2018-09-06 09:15:43

标签: python elasticsearch pyspark

我想在PySpark中创建一个DataFrame并将其保存到Elasticsearch:

from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql import Row

spark = SparkSession \
    .builder \
    .appName("Test") \
    .master("local[4]") \
    .config("es.nodes","localhost") \
    .config("es.port",9200) \
    .config("es.nodes.wan.only","true") \
    .getOrCreate()

schema = StructType([StructField('id', StringType()), \
                     StructField('timestamp',LongType()), \
                     StructField('coordinates',ArrayType(DoubleType())])
rows = [Row(id="11", timestamp=1523975430000, coordinates = [41.5555, 2.1522])]

df = spark.createDataFrame(rows, schema)

df.write \
        .format("org.elasticsearch.spark.sql") \
        .mode('append') \
        .option("es.resource", "myindex/intensity") \
        .save()

这是Elasticsearch中我要保存数据的索引myindex及其映射intensity

{
    "mappings": {
        "intensity": {

        "properties": {
          "id": {
            "type":"keyword"
          },
          "timestamp": {
            "type":"date"
          },
          "coordinates": {
            "type":"geo_point"
          }
        }
      }
    }
}

geo_point出现了问题。它应保存如下:

"coordinates": {
  "lat": 41.5555,
  "lon": 2.1522
}

但是对于我来说,它保存如下:

"coordinates": [
  41.5555,
  2.1522
]

0 个答案:

没有答案