pandas to_json以双引号输出,以提供给Elasticsearch

时间:2018-01-24 11:33:40

标签: python json pandas elasticsearch dataframe

Pandas DataFrame的to_json方法正确返回数据。但我无法在下一步处理它。例如,

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

myst="""
20-01-17    pizza   90
21-01-17    pizza   120
22-01-17    pizza   239
23-01-17    pizza   200
20-01-17    fried-rice  100
21-01-17    fried-rice  120
22-01-17    fried-rice  110
23-01-17    fried-rice  190
20-01-17    ice-cream   8
21-01-17    ice-cream   23
22-01-17    ice-cream   21
23-01-17    ice-cream   100
"""
u_cols=['date', 'product', 'sales']

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\s+', names = u_cols)

下一步是将数据导出到JSON,以便在Elasticsearch中导入。

tmp=df.to_json(orient="records")
import json
json.loads(tmp)

这将返回以下(无效的JSON)输出:

[{'date': '20-01-17', 'product': 'pizza', 'sales': 90},
 {'date': '21-01-17', 'product': 'pizza', 'sales': 120},
 {'date': '22-01-17', 'product': 'pizza', 'sales': 239},
 {'date': '23-01-17', 'product': 'pizza', 'sales': 200},
 {'date': '20-01-17', 'product': 'fried-rice', 'sales': 100},
 {'date': '21-01-17', 'product': 'fried-rice', 'sales': 120},
 {'date': '22-01-17', 'product': 'fried-rice', 'sales': 110},
 {'date': '23-01-17', 'product': 'fried-rice', 'sales': 190},
 {'date': '20-01-17', 'product': 'ice-cream', 'sales': 8},
 {'date': '21-01-17', 'product': 'ice-cream', 'sales': 23},
 {'date': '22-01-17', 'product': 'ice-cream', 'sales': 21},
 {'date': '23-01-17', 'product': 'ice-cream', 'sales': 100}]

似乎Elastic不喜欢单引号。如何用双引号获得与上面相同的输出?

1 个答案:

答案 0 :(得分:1)

不确定它会有什么帮助,但在您的代码之后添加

的内容
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

es = Elasticsearch()

actions = [
     {
     '_index' : 'transactions',
     '_type' : 'content',
     '_date' : rec['date'],
     '_product' : rec['product'],
     '_sales' : rec['sales'],
     }
for rec in json.loads(tmp)
]

bulk(es, actions)

应该允许创建索引。