阅读3列csv并对一些元素进行分组

时间:2014-09-16 21:27:30

标签: python

我有一个3 columns (date, name, number)的csv,大约有20K行。我想创建一个按日期键入的字典,其值是一个名称字典:该日期的数字。最重要的是,如果名称包含一个关键词,我想在一起添加一些元素,因此它们将被列为关键字:数字之和,而不是它们各自的条目。

E.g。如果csv有四个条目

 6/17/84, Blackcat, 10, 
 6/17/84, Dog, 20, 
 6/17/84, Tabbycat, 12,
 6/17/84, Lizard, 5 

且关键字为cat,结果应为

{6/17/84: {'Dog':20, 'Lizard':5, 'cat':22}}

这就是我想出的。还有更好的方法吗?

def dict_of_csv(file_name, group_labels_with):
    complete_dict = {}
    key_word = [x.lower() for x in group_labels_with]
    for i in file_name:
        key = i[1].lower()
        key_value = int(i[2])
        row_date = time.strptime(i[0], "%m/%d/%y")
        if row_date not in complete_dict:
            complete_dict[row_date] = {}
            for name in key_word:
                complete_dict[row_date][name] = 0
        if any(name in key for name in key_word):
            for name in key_word:
                if name in key:
                    key = name
            complete_dict[row_date][key] += key_value
        else:
            complete_dict[row_date][key] = key_value
    return complete_dict

1 个答案:

答案 0 :(得分:0)

您可以使用setdefault

简化代码
keyword = "cat"
my_d = {}

with open("in.csv") as f: # use with to open your file as it will automatically close it.
    for line in f:
        a, b, c = line.rstrip().split(",")[:3] # account for missing delimiter after last element
        c = int(c) 
        my_d.setdefault(a, {keyword:0}) # set default key/value using keyword
        if keyword in b.lower(): # if keyword is in the string b add the value to keyword
            my_d[a][keyword] += c
        else:
            my_d[a][b] = c# else add new key/value

print(my_d)

{'6/17/84': {' Dog': 20, ' Lizard': 5, 'cat': 22}}

我遗漏了解析你的约会row_date = time.strptime(i[0], "%m/%d/%y"),因为我看不出有任何理由需要这样做。

如果您想要使用订单和OrderedDict:

from collections import OrderedDict
keyword = "cat"
my_d = OrderedDict()
with open("in.csv") as f:
    for line in f:
        a, b, c = line.rstrip().split(",")[:3]
        c = int(c)
        my_d.setdefault(a, {keyword:0})
        if keyword in b.lower():
            my_d[a][keyword] += c
        else:
            my_d[a][b]= c

来自输入:

6/17/84, Blackcat, 10, 
6/17/84, Dog, 20,
6/17/84, Tabbycat, 12,
6/17/84, Lizard, 5
6/18/84, Blackcat, 10,
6/18/84, Dog, 20,
6/18/84, Tabbycat, 12,
6/18/84, Lizard, 5

输出:

OrderedDict([('6/17/84', {' Dog': 20, ' Lizard': 5, 'cat': 22}), ('6/18/84', {' Dog': 20, ' Lizard': 5, 'cat': 22})])

如果要解析日期,请使用datetime:

from datetime import datetime
with open("in.csv") as f:
    for line in f:
        a, b, c = line.rstrip().split(",")[:3]
        c = int(c)
        a = datetime.strptime(a, "%m/%d/%y")
        my_d.setdefault(a, {keyword:0})
        if keyword in b.lower():
            my_d[a][keyword] += c
        else:
            my_d[a][b]= c
print(my_d)
{datetime.datetime(1984, 6, 17, 0, 0): {' Dog': 20, ' Lizard': 5, 'cat': 22}, datetime.datetime(1984, 6, 18, 0, 0): {' Dog': 20, ' Lizard': 5, 'cat': 22}}