Python和ElasticSearch:使用索引将CSV转换为JSON

时间:2019-03-28 07:07:44

标签: python elasticsearch

我想用Python将一堆CSV文件转换为特定的.JSON文件格式。

这是我的示例CSV文件:

13

..这是我想要的json输出:

L1-CR109 Security Counter,has been forced,2019-02-26
L1-CR109 Security Counter,has been forced,2019-02-26
L1-CR109 Security Counter,has been forced,2019-02-26
L1-CR109 Security Counter,has been forced,2019-02-26

目前,我能够产生以下json格式的结果:

{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "location" : "L1-CR109 Security Counter", "door_activity": "has been forced", "2019-02-26"}
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "location" : "L1-CR109 Security Counter", "door_activity": "has been forced", "2019-02-26"}
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "location" : "L1-CR109 Security Counter", "door_activity": "has been forced", "2019-02-26"}
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "location" : "L1-CR109 Security Counter", "door_activity": "has been forced", "2019-02-26"}

..这是我的Python代码:

[{"location": "L1-CR109 Security Counter", "door_status": "has been forced", "date": "2019-02-21"}, 
{"location": "L1-CR109 Security Counter", "door_status": "has been forced", "date": "2019-02-21"}, 
{"location": "L1-CR109 Security Counter", "door_status": "has been forced", "date": "2019-02-21"}, 
{"location": "L1-CR109 Security Counter", "door_status": "has been forced", "date": "2019-02-21"}

我试图寻找解决方案,但无济于事。我可以知道代码中缺少什么吗?我又可以寻求建议,为什么Elastic Search仅允许我想要的json输出格式带有索引而不是普通的python格式吗?

2 个答案:

答案 0 :(得分:1)

这是一种实现方法。注意-您没有给日期字段起一个名字,所以我做到了,以使其成为有效的json)。

import json
import csv
import sys
from collections import OrderedDict

index_line = { "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
with open('input.csv', 'r') as infile, open('outfile.json', 'w+') as outfile:

    inreader = csv.reader(infile, delimiter=',', quotechar='"')

    for line in inreader:
        document = OrderedDict()
        document['location'] = line[0]
        document['door_activity'] = line[1]
        document['date'] = line[2]
        json.dump(index_line, outfile)
        outfile.write("\n")
        json.dump(document, outfile)
        outfile.write("\n")

sys.exit()

答案 1 :(得分:1)

以下是带有Python pandas软件包的版本:

import json
import pandas as pd

in_file = '/Elastic Search/Converted Detection/Converted CSV'
out_file = '/Elastic Search/Converted Detection/Converted JSON'
index_line = '{"index": {"_index": "test", "_type": "_doc", "_id": "1"}}\n'

阅读:

df = pd.read_csv(in_file)

或者直接从字符串中获取

text = "L1-CR109 Security Counter,has been forced,2019-02-26\n"*4
df = pd.read_csv(StringIO(text),header=None)

现在编写所需的格式(请注意,我已经添加了“日期”,因此它是有效的JSON):

with open('outfile.json', 'w+') as outfile:
    for row in df.to_dict('records'):
       data = json.dumps(dict(list(zip(title,row.values()))))
       outfile.write(index_line+data)