我正在尝试遍历csv文件(大约9100万条记录),并根据下面的示例记录使用Python dict
创建一个新的json / text文件(文件按id,类型排序)。
id,type,value
4678367,1,1001
4678367,2,1007
4678367,2,1008
5678945,1,9000
5678945,2,8000
代码应该在匹配id和type时追加值,否则创建一个新记录,如下所示。我想把它写到目标文件
我怎样才能在Python中执行此操作?
{'id':4678367,
'id_1':[1001],
'id_2':[1007,1008]
},
{'id':5678945,
'id_1':[9000],
'id_2':[8000]
}
答案 0 :(得分:0)
这是收集物品的一种方法。我把写作文件留作练习:
with open('test.csv') as f:
reader = csv.reader(f)
columns = next(reader)
results = []
record = {}
current_type = 0
items = []
for id_, type, value in reader:
if current_type != type:
if current_type:
record['id_{}'.format(current_type)] = items
items = []
current_type = type
if id_ != record.get('id'):
if record:
results.append(record)
record = dict(id=id_)
items.append(value)
if record:
record['id_{}'.format(current_type)] = items
results.append(record)
print(results)
[
{'id': '4678367', 'id_1': ['1001'], 'id_2': ['1007', '1008']},
{'id': '5678945', 'id_1': ['9000'], 'id_2': ['8000']}
]
答案 1 :(得分:0)
import csv
from collections import namedtuple
with open("data.csv","r") as f:
read = csv.reader(f)
header = next(read)
col = namedtuple('col',header)
dictionary = {}
for values in read:
data = col(*values)
type_ = 'id_' + str(data.type)
if data.id in dictionary:
local_dict = dictionary[data.id]
if type_ in local_dict:
local_dict[type_].append(data.value)
else:
local_dict[type_] = [data.value]
else:
dictionary.setdefault(data.id,{'id':data.id,type_:[data.value]})
print(*dictionary.values(),sep="\n")
>>>{'id': '4678367', 'id_1': ['1001'], 'id_2': ['1007', '1008']}
{'id': '5678945', 'id_1': ['9000'], 'id_2': ['8000']}