Question

我有一个3 columns (date, name, number)的csv，大约有20K行。我想创建一个按日期键入的字典，其值是一个名称字典：该日期的数字。最重要的是，如果名称包含一个关键词，我想在一起添加一些元素，因此它们将被列为关键字：数字之和，而不是它们各自的条目。

E.g。如果csv有四个条目

 6/17/84, Blackcat, 10, 
 6/17/84, Dog, 20, 
 6/17/84, Tabbycat, 12,
 6/17/84, Lizard, 5

且关键字为cat，结果应为

{6/17/84: {'Dog':20, 'Lizard':5, 'cat':22}}

这就是我想出的。还有更好的方法吗？

def dict_of_csv(file_name, group_labels_with):
    complete_dict = {}
    key_word = [x.lower() for x in group_labels_with]
    for i in file_name:
        key = i[1].lower()
        key_value = int(i[2])
        row_date = time.strptime(i[0], "%m/%d/%y")
        if row_date not in complete_dict:
            complete_dict[row_date] = {}
            for name in key_word:
                complete_dict[row_date][name] = 0
        if any(name in key for name in key_word):
            for name in key_word:
                if name in key:
                    key = name
            complete_dict[row_date][key] += key_value
        else:
            complete_dict[row_date][key] = key_value
    return complete_dict

Answer 1

您可以使用setdefault

简化代码

keyword = "cat"
my_d = {}

with open("in.csv") as f: # use with to open your file as it will automatically close it.
    for line in f:
        a, b, c = line.rstrip().split(",")[:3] # account for missing delimiter after last element
        c = int(c) 
        my_d.setdefault(a, {keyword:0}) # set default key/value using keyword
        if keyword in b.lower(): # if keyword is in the string b add the value to keyword
            my_d[a][keyword] += c
        else:
            my_d[a][b] = c# else add new key/value

print(my_d)

{'6/17/84': {' Dog': 20, ' Lizard': 5, 'cat': 22}}

我遗漏了解析你的约会row_date = time.strptime(i[0], "%m/%d/%y")，因为我看不出有任何理由需要这样做。

如果您想要使用订单和OrderedDict：

from collections import OrderedDict
keyword = "cat"
my_d = OrderedDict()
with open("in.csv") as f:
    for line in f:
        a, b, c = line.rstrip().split(",")[:3]
        c = int(c)
        my_d.setdefault(a, {keyword:0})
        if keyword in b.lower():
            my_d[a][keyword] += c
        else:
            my_d[a][b]= c

来自输入：

6/17/84, Blackcat, 10, 
6/17/84, Dog, 20,
6/17/84, Tabbycat, 12,
6/17/84, Lizard, 5
6/18/84, Blackcat, 10,
6/18/84, Dog, 20,
6/18/84, Tabbycat, 12,
6/18/84, Lizard, 5

输出：

OrderedDict([('6/17/84', {' Dog': 20, ' Lizard': 5, 'cat': 22}), ('6/18/84', {' Dog': 20, ' Lizard': 5, 'cat': 22})])

如果要解析日期，请使用datetime：

from datetime import datetime
with open("in.csv") as f:
    for line in f:
        a, b, c = line.rstrip().split(",")[:3]
        c = int(c)
        a = datetime.strptime(a, "%m/%d/%y")
        my_d.setdefault(a, {keyword:0})
        if keyword in b.lower():
            my_d[a][keyword] += c
        else:
            my_d[a][b]= c
print(my_d)
{datetime.datetime(1984, 6, 17, 0, 0): {' Dog': 20, ' Lizard': 5, 'cat': 22}, datetime.datetime(1984, 6, 18, 0, 0): {' Dog': 20, ' Lizard': 5, 'cat': 22}}

阅读3列csv并对一些元素进行分组

1 个答案: