Question

我没有做过很多Python编程，我试图在基本的csv中读取，然后从中创建一个嵌套的字典。这是我到目前为止所有，我似乎有一些循环或覆盖我的字典的问题。我知道它效率不高。

import csv

reader = csv.DictReader(open("fruit.csv"))

fruit_dict = {}
color_dict = {}
for row in reader:
    info_list = []
    count = row.pop('count')
    info_list.append(count)
    year = row.pop('year')
    info_list.append(year)
    info = row.pop('info')
    info_list.append(info)
    if row['color'] not in color_dict:
        #print row['color']
        color_dict['color'] = row['color']
            #print fruit_dict  
        if row['fruit'] not in fruit_dict:
            fruit_dict['name'] = row['fruit']
            #print fruit_dict
            #print info_list
            list_of_info_lists =[]              
            list_of_info_lists.append(info_list)
            fruit_dict['fruitInfo'] = list_of_info_lists
            color_dict['fruit'] = fruit_dict
            #print color_dict
        else:
            list_of_info_lists.append(info_list)
            fruit_dict['fruitInfo'] = list_of_info_lists
            color_dict['fruit'] = fruit_dict
            #print color_dict
    else:
        if row['color'] in color_dict:
            if row['fruit'] not in fruit_dict:
                fruit_dict['name'] = row['fruit']
                #print fruit_dict
                #print info_list
                list_of_info_lists =[]              
                list_of_info_lists.append(info_list)
                fruit_dict['fruitInfo'] = list_of_info_lists
                color_dict['fruit'] = fruit_dict
                #print color_dict
            else:
                list_of_info_lists.append(info_list)
                fruit_dict['fruitInfo'] = list_of_info_lists
                color_dict['fruit'] = fruit_dict
                #print color_dict

#print color_dict

这是csv：

color,fruit,year,count,info
red,apple,1970,3,good
red,apple,1922,5,okay
orange,orange,1935,2,okay
green,celery,2001,22,marginal
red,cherries,1999,5,outstanding
orange,carrot,1952,7,okay
green,celery,2014,2,good
green,grapes,2001,12,good

我得到的是：

{'color': 'green', 'fruit': {'name': 'grapes', 'fruitInfo': [['12', '2001', 'good']]}}

这很可爱，除了我期待比这更多的一行，并期待一个列表列表的名称＆＃39;已存在，例如：

{'color': 'red', 'fruit': {'name': 'apple', 'fruitInfo': [['5', '1922', 'okay'],['3', '1970', 'good']]}}

任何建议都将不胜感激。最终的目标是生成一个json文件。

谢谢，苏珊

这是我最后想要的格式：

[{'color': 'red', 'fruit': {'name': 'apple', 'fruitInfo': [['5', '1922', 'okay'],['3', '1970', 'good']]}},
{'color': 'red', 'fruit': {'name': 'cherries', 'fruitInfo': [['5', '1999', 'outstanding']]}},
{'color': 'orange', 'fruit': {'name': 'orange', 'fruitInfo': [['2', '1935', 'okay']]}},
{'color': 'orange', 'fruit': {'name': 'carrot', 'fruitInfo': [['7', '1952', 'okay']]}},
{'color': 'green', 'fruit': {'name': 'celery', 'fruitInfo': [['2', '2014', 'good'],['22', '2001', 'marginal']]}},
{'color': 'green', 'fruit': {'name': 'grapes', 'fruitInfo': [['12', '2001', 'good']]}}]

Answer 1

Jon Clements的回答是最佳解决方案。如果您想要了解最初开始帮助您了解可能出错的地方，请查看以下内容：

results_list = []
colorFruitTuple_set = set()
for row in reader:
    info_list = [row['count'], row['year'],row['info']]
    if (row['color'], row['fruit']) not in colorFruitTuple_set:
        color_dict = {}
        fruit_dict = {}
        color_dict['color'] = row['color']
        fruit_dict['name'] = row['fruit']

        list_of_info_lists = [info_list]

        fruit_dict['fruitInfo'] = list_of_info_lists
        color_dict['fruit'] = fruit_dict
        results_list.append(color_dict)
        colorFruitTuple_set.add((row['color'], row['fruit']))
    else:
        for color_dict in results_list:
            if color_dict["color"] == row['color'] and color_dict["fruit"]["name"] == row["fruit"]:
                color_dict["fruit"]["fruitInfo"].append(info_list)

我认为这与你的目标一致。当您需要创建多个时，您尝试使用相同的color_dict和fruit_dict - 这也意味着您无法使用它们来跟踪重复项。这仅仅是为了学习目的 - Jon的方式是正确的方法。

希望这有帮助！

Answer 2

您可以在此处使用defaultdict列表，将fruitInfo和2元组作为您的密钥（颜色和水果），然后重新格式化，例如：

import csv
from collections import defaultdict

dd = defaultdict(list)
with open('yourfile.csv') as fin:
    csvin = csv.DictReader(fin)
    for row in csvin:
        dd[row['color'], row['fruit']].append([row['count'], row['year'], row['info']])

然后使用：

稍微重新格式化dd

reformatted = [{'color': c, 'fruit': {'name': f, 'fruitInfo': v}} for (c, f), v in dd.items()]

给你：

[{'color': 'orange',
  'fruit': {'fruitInfo': [['7', '1952', 'okay']], 'name': 'carrot'}},
 {'color': 'green',
  'fruit': {'fruitInfo': [['12', '2001', 'good']], 'name': 'grapes'}},
 {'color': 'orange',
  'fruit': {'fruitInfo': [['2', '1935', 'okay']], 'name': 'orange'}},
 {'color': 'red',
  'fruit': {'fruitInfo': [['3', '1970', 'good'], ['5', '1922', 'okay']],
            'name': 'apple'}},
 {'color': 'red',
  'fruit': {'fruitInfo': [['5', '1999', 'outstanding']], 'name': 'cherries'}},
 {'color': 'green',
  'fruit': {'fruitInfo': [['22', '2001', 'marginal'], ['2', '2014', 'good']],
            'name': 'celery'}}]

Answer 3

在处理字典词典时，我的模式是这样的：

sub_dict = main_dict.get(key, {})
sub_dict[sub_key] = sub_value
main_dict[key] = sub_dict

这会获取子词典，如果它不存在，则为{}。然后它为子字典赋值，并将子字典放回主字典中。

fruit_dict = {}
for row in reader:
    # make the info_list
    info_list = [row['count'], row['year'], row['info']]
    # extract color and fruit into variables
    color = row['color']
    fruit = row['fruit']
    # unpack the dictionaries and list
    colors = fruit_dict.get(color, {})
    fruits = colors.get(fruit, {})
    info = fruits.get('info', [])
    # reassemble the list and dictionaries
    info.append(info_list)
    fruits['info'] = info
    colors[fruit] = fruits
    fruit_dict[color] = colors

结果与您的示例略有不同，但需要更改它以使用颜色和水果作为键。

{＆＃39; orange＆＃39;：{＆＃39; orange＆＃39;：{＆＃39; info＆＃39;：[[＆＃39; 2＆＃39;，＆＃39; 1935＆＃ 39;，＆＃39;好的＆＃39;]]}，＆＃39;胡萝卜＆＃39;：{＆＃39; info＆＃39;：[[＆＃39; 7＆＃39;，＆＃39; 1952＆＃39;，＆＃39;好的＆＃39;]]}}，＆＃39;绿色＆＃39;：{＆＃39; celery＆＃39;：{＆＃39; info＆＃39;：[[＆＃ 39; 22＆＃39;，＆＃39; 2001＆＃39;，＆＃39; marginal＆＃39;]，[＆＃39; 2＆＃39;，＆＃39; 2014＆＃39;，＆＃39; good＆＃39;]]}，＆＃39; grape＆＃39;：{＆＃39; info＆＃39;：[[＆＃39; 12＆＃39;，＆＃39; 2001＆＃39;，＆＃39;好的＆＃39;]]}}，＆＃39; red＆＃39;：{＆＃39; cherries＆＃39;：{＆＃39; info＆＃39;：[[＆＃39; 5＆＃39;，＆＃39; 1999＆＃39;，＆＃39;杰出＆＃39;]]}，＆＃39; apple＆＃39;：{＆＃39; info＆＃39;：[[＆＃39; 3＆＃39;，＆＃39; 1970＆＃39;，＆＃39; good＆＃39;]，[＆＃39; 5＆＃39;，＆＃39; 1922＆＃39;，＆＃39;好的＆＃39;]]}} }

需要python词典循环辅助

3 个答案: