我正在通过结构化流将来自Apache Spark的文档插入ES。
不幸的是,Spark-ES连接器(https://github.com/elastic/elasticsearch-hadoop/issues/1173)中有一个未解决的错误,其负面影响是将源端(Spark)上的日期字段作为unix-timestamps / long类型发送到接收器( ES)。
我认为,在ES端进行转换的索引模板可能是在ES中采用正确格式(日期)的好解决方法。
我的索引模板是:
{
"index_patterns": "my_index_*",
"mappings": {
"peerType_count": {
"dynamic_templates": [
{
"timestamps": {
"path_match": "*.window.*",
"match_mapping_type": "long",
"mapping": {
"type": "date",
"format": "epoch_millis"
}
}
}
]
}
}
}
但是ES中的文档中仍然有unix时间戳:-/
{
"_index": "my_index",
"_type": "peerType_count",
"_id": "kUGWNmcBtkL7EG0gS280",
"_version": 1,
"_score": 1,
"_source": {
"window": {
"start": 1535958000000,
"end": 1535958300000
},
"input__source_peerType": "peer2",
"count": 1
}
}
有人知道什么地方可能出问题了吗?
PS:是否有可用的es-mapping-debugger?
答案 0 :(得分:0)
想分享我的解决方法,只需使用以下http请求在ES中创建提取管道:
PUT _ingest/pipeline/fix_date_1173
{
"description": "converts from unix ms to date, workaround for https://github.com/elastic/elasticsearch-hadoop/issues/1173",
"processors": [
{
"date": {
"field": "window.start",
"formats": ["UNIX_MS"],
"target_field":"window.start"
}
},
{
"date": {
"field": "window.end",
"formats": ["UNIX_MS"],
"target_field":"window.end"
}
}
]
}
并通过以下方式在您的Spark代码中启用
.option("es.ingest.pipeline", "fix_date_1173")
感谢@val的提示!