Question

我正在从elasticsearch读取数据。当我在kibana上可视化日期格式时，它显示日期2020年8月5日@ 23：00：00.000 =>所以它是正确的但是当我从elasticsearch中读取它来进行一些机器学习时，我注意到日期格式错误日期1.596665e + 12

我正在使用pyspark将索引的内容收集到一个数据帧中，如果有解决方案，我可以在scala中完成

`from elasticsearch导入Elasticsearch 从pandasticsearch导入中选择

es = Elasticsearch（['http://localhost:9200']，timeout = 600）文档= es.search（index ='sub1'，body = {}）

pandas_df = Select.from_dict（documents..to_pandas（）打印（pandas_df） '

它显示了错误的日期格式，那么我该如何解决呢？有帮助吗？谢谢

Answer 1

1.596665e+12等于1596665000000，它是与格林尼治标准时间Wednesday, August 5, 2020 10:03:20 PM相对应的unix毫秒时间戳。

您基本上有3个选择：

使用script_field到parse/convert ts到人类可读的日期。请注意，您需要从响应b / c中提取脚本字段，它们不属于_source。
Convert是在获取文档之后但在将文档加载到df中之前的时间戳（最好在循环/列表理解/地图中）。
使用已转换的时间戳重新索引您的数据。可以从inside of an _update script完成，您无需删除所有内容。

更新

第2点的实现

from elasticsearch import Elasticsearch
from datetime import datetime as dt


def convert_ts(hit):
    hit = hit['_source']

    try:
        ts_from_doc = hit.get('date_field', None)

        if not ts_from_doc:
            raise ValueError('`date_field` not found')

        # incoming as millisec so convert to sec
        as_date = dt.fromtimestamp(
            int(ts_from_doc / 1000.0)
        ).strftime('%Y-%m-%d %H:%M:%S')

        hit['date_field_as_date'] = as_date

    except Exception as e:
        print(e)

    return hit


es = Elasticsearch(['http://localhost:9200'], timeout=600)
documents = es.search(index='sub1', body={})['hits']['hits']
documents = [convert_ts(doc) for doc in documents]

print(documents)

# pandas etc ...

从ElasticSearch读取数据时日期格式问题

1 个答案: