python - 根据csv文件的内容创建JSON /文本文件 - Thinbug

根据csv文件的内容创建JSON /文本文件

时间：2018-01-22 04:47:13

标签： python json python-3.x csv

我正在尝试遍历csv文件（大约9100万条记录），并根据下面的示例记录使用Python dict创建一个新的json / text文件（文件按id，类型排序）。

id,type,value
4678367,1,1001
4678367,2,1007
4678367,2,1008
5678945,1,9000
5678945,2,8000

代码应该在匹配id和type时追加值，否则创建一个新记录，如下所示。我想把它写到目标文件

我怎样才能在Python中执行此操作？

{'id':4678367,
 'id_1':[1001],
 'id_2':[1007,1008]
},
{'id':5678945,
 'id_1':[9000],
 'id_2':[8000]
}

2 个答案:

答案 0 :(得分：0)

这是收集物品的一种方法。我把写作文件留作练习：

代码：

with open('test.csv') as f:
    reader = csv.reader(f)
    columns = next(reader)
    results = []
    record = {}
    current_type = 0
    items = []
    for id_, type, value in reader:
        if current_type != type:
            if current_type:
                record['id_{}'.format(current_type)] = items
                items = []
            current_type = type

        if id_ != record.get('id'):
            if record:
                results.append(record)
            record = dict(id=id_)

        items.append(value)

    if record:
        record['id_{}'.format(current_type)] = items
        results.append(record)

print(results)

结果：

[
    {'id': '4678367', 'id_1': ['1001'], 'id_2': ['1007', '1008']}, 
    {'id': '5678945', 'id_1': ['9000'], 'id_2': ['8000']}
]

答案 1 :(得分：0)

import csv
from collections import namedtuple

with open("data.csv","r") as f:
    read = csv.reader(f)
    header = next(read)
    col = namedtuple('col',header)
    dictionary = {}
    for values in read:
        data = col(*values)
        type_ = 'id_' + str(data.type)
        if data.id in dictionary:
            local_dict = dictionary[data.id]                
            if type_ in local_dict:
                local_dict[type_].append(data.value)
            else:
                local_dict[type_] = [data.value]
        else:
            dictionary.setdefault(data.id,{'id':data.id,type_:[data.value]})
print(*dictionary.values(),sep="\n")
>>>{'id': '4678367', 'id_1': ['1001'], 'id_2': ['1007', '1008']}
   {'id': '5678945', 'id_1': ['9000'], 'id_2': ['8000']}