Question

我有一个包含大量（有意）重复的数据集。我想崩溃（？）使它更适合我的需要。数据如下：

Header1, Header2,  Header3
Example1, Content1, Stuff1
Example1, Content2, Stuff2
Example1, Content3, Stuff3
Example2, Content1, Stuff1
Example2, Content5, Stuff5
etc...

我希望最终将第一列的值作为键，并将dicts列表作为这些键的值，如下所示：

{Example1 : [{Header2:Content1, Header3:Stuff1}, {Header2:Content2, Header3:Stuff2}, {Header2:Content3, Header3:Stuff3}],
 Example2 : [{Header2:Content1, Header3:Stuff1}, {Header2:Content5, Header3:Stuff5}]}

我是Python的新手，也是新手程序员，所以如果这个问题令人困惑，请随时澄清一下。谢谢！

更新我因为没有发布我的示例代码而被正确地召集了（感谢您保持诚实！）所以这就是。 下面的代码，但由于我是Python的新手，我不知道它是否写得好。此外，dict以相反的顺序结束键（例1和例2）。这并不重要，但我不明白为什么。

def gather_csv_info():
    all_csv_data = []
    flattened_data = {}
    reading_csv = csv.DictReader(open(sys.argv[1], 'rb'))

    for row in reading_csv:
        all_csv_data.append(row)

    for row in all_csv_data:
        if row["course_sis_ids"] in flattened_data:
            flattened_data[row["course_sis_ids"]].append({"user_sis_ids":row["user_sis_ids"], "file_ids":row["file_ids"]})
        else:
            flattened_data[row["course_sis_ids"]] = [{"user_sis_ids":row["user_sis_ids"], "file_ids":row["file_ids"]}]

    return flattened_data

Answer 1

当你改变你的问题时，我完全改变了答案，所以我只是在你自己的答案中整理了代码，所以它更像是“Pythonic”：

import csv
from collections import defaultdict

def gather_csv_info(filename):
    all_csv_data = []
    flattened_data = defaultdict(list)
    with open(filename, 'rb') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            key = row["Header1"]
            flattened_data[key].append({"Header2":row["Header2"], "Header3":row["Header3"]})
    return flattened_data

print(gather_csv_info('data.csv'))

不确定为什么要这种格式的数据，但这取决于你。

Answer 2

此代码有效，但我不知道pythonic是多少，我不明白为什么flattened_data dict的密钥顺序与它们出现在原始CSV中的顺序相反。它们并不完全无关紧要，但它很奇怪。

def gather_csv_info():
    all_csv_data = []
    flattened_data = {}
    reading_csv = csv.DictReader(open(sys.argv[1], 'rb'))

    for row in reading_csv:
        all_csv_data.append(row)

    for row in all_csv_data:
        if row["course_sis_ids"] in flattened_data:
            flattened_data[row["course_sis_ids"]].append({"user_sis_ids":row["user_sis_ids"], "file_ids":row["file_ids"]})
        else:
            flattened_data[row["course_sis_ids"]] = [{"user_sis_ids":row["user_sis_ids"], "file_ids":row["file_ids"]}]

    return flattened_data

阅读csv到dicts列表的字典

2 个答案: