从Spark写入Elasticsearch,错误的时间戳

时间:2018-01-26 21:24:52

标签: elasticsearch pyspark elasticsearch-hadoop

我有一列Spark数据帧:

<class 'pyspark.sql.dataframe.DataFrame'>
StructType(List(StructField(updateDate,TimestampType,true)))

使用spark写入elasticsearch时,updateDate字段不会被视为日期,而是写为unix时间戳(ms)。

def write_to_elastic(table, destination):
    table.write \
      .format("org.elasticsearch.spark.sql") \
      .option("es.mapping.date.rich", "true") \
      .mode("overwrite") \
      .option("es.index.auto.create", "true") \
      .option("es.resource", destination + "/table") \
      .option("es.nodes", ce.es_nodes) \
      .option("es.net.ssl.protocol", "true") \
      .option("es.nodes.wan.only", "true") \
      .option("es.net.http.auth.user", ce.es_user) \
      .option("es.field.read.empty.as.null", "yes") \
      .option("es.net.http.auth.pass", ce.es_password) \
      .save()

以下是提取的项目:

  {
  "test-date": {
    "aliases": {},
    "mappings": {
      "table": {
        "properties": {
          "updateDate": {
            "type": "long"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1517000418516",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "DMYyE1NPTpyE9HuKI29BqA",
        "version": {
          "created": "6010299"
        },
        "provided_name": "test-date"
      }
    }
  }
}

如果我将Spark数据帧写入文件,则日期字段写为: 的 2017-10-27T00:00:00.000Z

可能导致此行为的原因是什么?

0 个答案:

没有答案