我有一个3 columns (date, name, number)
的csv,大约有20K行。我想创建一个按日期键入的字典,其值是一个名称字典:该日期的数字。最重要的是,如果名称包含一个关键词,我想在一起添加一些元素,因此它们将被列为关键字:数字之和,而不是它们各自的条目。
E.g。如果csv有四个条目
6/17/84, Blackcat, 10,
6/17/84, Dog, 20,
6/17/84, Tabbycat, 12,
6/17/84, Lizard, 5
且关键字为cat
,结果应为
{6/17/84: {'Dog':20, 'Lizard':5, 'cat':22}}
这就是我想出的。还有更好的方法吗?
def dict_of_csv(file_name, group_labels_with):
complete_dict = {}
key_word = [x.lower() for x in group_labels_with]
for i in file_name:
key = i[1].lower()
key_value = int(i[2])
row_date = time.strptime(i[0], "%m/%d/%y")
if row_date not in complete_dict:
complete_dict[row_date] = {}
for name in key_word:
complete_dict[row_date][name] = 0
if any(name in key for name in key_word):
for name in key_word:
if name in key:
key = name
complete_dict[row_date][key] += key_value
else:
complete_dict[row_date][key] = key_value
return complete_dict
答案 0 :(得分:0)
您可以使用setdefault
keyword = "cat"
my_d = {}
with open("in.csv") as f: # use with to open your file as it will automatically close it.
for line in f:
a, b, c = line.rstrip().split(",")[:3] # account for missing delimiter after last element
c = int(c)
my_d.setdefault(a, {keyword:0}) # set default key/value using keyword
if keyword in b.lower(): # if keyword is in the string b add the value to keyword
my_d[a][keyword] += c
else:
my_d[a][b] = c# else add new key/value
print(my_d)
{'6/17/84': {' Dog': 20, ' Lizard': 5, 'cat': 22}}
我遗漏了解析你的约会row_date = time.strptime(i[0], "%m/%d/%y")
,因为我看不出有任何理由需要这样做。
如果您想要使用订单和OrderedDict:
from collections import OrderedDict
keyword = "cat"
my_d = OrderedDict()
with open("in.csv") as f:
for line in f:
a, b, c = line.rstrip().split(",")[:3]
c = int(c)
my_d.setdefault(a, {keyword:0})
if keyword in b.lower():
my_d[a][keyword] += c
else:
my_d[a][b]= c
来自输入:
6/17/84, Blackcat, 10,
6/17/84, Dog, 20,
6/17/84, Tabbycat, 12,
6/17/84, Lizard, 5
6/18/84, Blackcat, 10,
6/18/84, Dog, 20,
6/18/84, Tabbycat, 12,
6/18/84, Lizard, 5
输出:
OrderedDict([('6/17/84', {' Dog': 20, ' Lizard': 5, 'cat': 22}), ('6/18/84', {' Dog': 20, ' Lizard': 5, 'cat': 22})])
如果要解析日期,请使用datetime:
from datetime import datetime
with open("in.csv") as f:
for line in f:
a, b, c = line.rstrip().split(",")[:3]
c = int(c)
a = datetime.strptime(a, "%m/%d/%y")
my_d.setdefault(a, {keyword:0})
if keyword in b.lower():
my_d[a][keyword] += c
else:
my_d[a][b]= c
print(my_d)
{datetime.datetime(1984, 6, 17, 0, 0): {' Dog': 20, ' Lizard': 5, 'cat': 22}, datetime.datetime(1984, 6, 18, 0, 0): {' Dog': 20, ' Lizard': 5, 'cat': 22}}