我有一个充满CSV文件的文件路径。我使用Python glob
打开它们,csv.DictReader()
读取它们并将数据解析为字典作为键的字典。
CSV文件中的数据如下所示:
CSVfile1:
Name,A,B,C,D,Date
John,-1,2,4.0,-5.1,3/23/2016
Jacob,0,3,2.0,-2.3,3/23/2016
Jinglehimmer,1,100,5.0,-.1,3/23/2016
CSVfile2:
Name,A,B,C,D,Date
John,5,4,1.0,-1,3/24/2016
Jacob,0,1,7.0,-.1,3/24/2016
Schmidt,10,9,8,7,3/24/2016
我正在尝试在设定的日期期间(例如过去2天)对每个名称的A
,B
,C
和D
列中的数据进行总结)。例如,我试图获得一个新的字典列表,如下所示:
{Name: John, A: 4, B: 6, C: 5.0, D: -6.1, Date: 2}
{Name: Jacob, A: 0, B: 4, C: 9.0, D: -2.4, Date: 2}
{Name: Jinglehimmer, etc.}
{Name: Schmidt, etc.}
这是我到目前为止所知道的代码。这将打开每个CSV并为每行创建一个字典,并允许我遍历字典:
import csv
import glob
path = "."
newdict = {}
for filename in glob.glob(path):
with open(filename) as csv_file:
for row in csv.DictReader(csv_file):
编辑: 我尝试简单地将所有键值汇总到一个新的字典中,但我遇到了一个int + str错误。
for k in row.keys():
newdict[k] = newdict.get(k,0) + row[k]
我也不确定如何按Date:
键过滤只能获得x天的数据。
非常感谢任何正确方向的帮助或分数。
答案 0 :(得分:1)
以下方法应该有效:
import csv
import glob
from datetime import datetime, timedelta, date
days = 2
since = datetime.combine(date.today(), datetime.min.time()) - timedelta(days = days)
required_fields = ['A', 'B', 'C', 'D']
path = "."
newdict = {}
output = {}
for filename in glob.glob(path):
with open(filename) as csv_file:
for row in csv.DictReader(csv_file):
if datetime.strptime(row['Date'], '%m/%d/%Y') >= since:
name = row['Name']
try:
cur_entry = output[name]
entry = {field : cur_entry[field] + float(row[field]) for field in required_fields}
except KeyError as e:
entry = {field : float(row[field]) for field in required_fields}
entry['Date'] = days
output[name] = entry
for name, entry in output.items():
print name, entry
您所提供的数据将显示:
Jacob {'A': 0.0, 'C': 9.0, 'B': 4.0, 'D': -2.4}
Jinglehimmer {'A': 1.0, 'Date': 2, 'C': 5.0, 'B': 100.0, 'D': -0.1}
John {'A': 4.0, 'C': 5.0, 'B': 6.0, 'D': -6.1}
Schmidt {'A': 10.0, 'Date': 2, 'C': 8.0, 'B': 9.0, 'D': 7.0}