Question

我正在通过结构化流将来自Apache Spark的文档插入ES。

不幸的是，Spark-ES连接器（https://github.com/elastic/elasticsearch-hadoop/issues/1173）中有一个未解决的错误，其负面影响是将源端（Spark）上的日期字段作为unix-timestamps / long类型发送到接收器（ ES）。

我认为，在ES端进行转换的索引模板可能是在ES中采用正确格式（日期）的好解决方法。

我的索引模板是：

{
  "index_patterns": "my_index_*",
  "mappings": {
    "peerType_count": {
      "dynamic_templates": [
        {
          "timestamps": {
            "path_match": "*.window.*",
            "match_mapping_type": "long",
            "mapping": {
              "type": "date",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

但是ES中的文档中仍然有unix时间戳：-/

{
  "_index": "my_index",
  "_type": "peerType_count",
  "_id": "kUGWNmcBtkL7EG0gS280",
  "_version": 1,
  "_score": 1,
  "_source": {
    "window": {
      "start": 1535958000000,
      "end": 1535958300000
    },
    "input__source_peerType": "peer2",
    "count": 1
  }
}

有人知道什么地方可能出问题了吗？

PS：是否有可用的es-mapping-debugger？

Answer 1

想分享我的解决方法，只需使用以下http请求在ES中创建提取管道：

PUT _ingest/pipeline/fix_date_1173
{
    "description": "converts from unix ms to date, workaround for https://github.com/elastic/elasticsearch-hadoop/issues/1173",
    "processors": [
      {
        "date": {
          "field": "window.start",
          "formats": ["UNIX_MS"],
          "target_field":"window.start"
        }
      },
      {
        "date": {
          "field": "window.end",
          "formats": ["UNIX_MS"],
          "target_field":"window.end"
        }
      }
    ]
  }

并通过以下方式在您的Spark代码中启用

.option("es.ingest.pipeline", "fix_date_1173")

感谢@val的提示！

ES：索引模板不会从UNIX时间戳转换为日期

1 个答案: