我有以下示例CSV,名为results1111.csv:
Master #,Scrape,Date of Transaction
2C7E4B,6854585658,5/2/2007
2C7E4B,8283876134,5/8/2007
2C7E4B,4258586585,5/18/2007
C585ED,5554541212,5/18/2004
585868,5555551214,8/16/2012
我有以下代码打开CSV,然后将数据放入多个词典:
with open('c:\\results1111.csv', "r") as f:
f.next()
reader = csv.reader(f)
result = {}
for row in reader:
key = row[0]
result[key] = row[1:]
values = row[1:]
telnumber = row[1]
transdate = row[2]
#print key
#print values
#print telnumber
#print transdate
#print result
d = {}
d.setdefault(key, []).append(values)
print d
上述代码的输出是:
{'2C7E4B': [['6854585658', '5/2/2007']]}
{'2C7E4B': [['8283876134', '5/8/2007']]}
{'2C7E4B': [['4258586585', '5/18/2007']]}
{'C585ED': [['5554541212', '5/18/2004']]}
{'585868': [['5555551214', '8/16/2012']]}
我想在字典中搜索相同密钥与其绑定多个电话号码的任何实例,例如上面输出中的前三个条目。当发生这种情况时,我想删除最早的日期字典。然后,我想将所有剩余的词典输出回CSV。输出应如下所示:
2C7E4B,8283876134,5/8/2007
2C7E4B,4258586585,5/18/2007
C585ED,5554541212,5/18/2004
585868,5555551214,8/16/2012
由于有数千个密钥(在实际输入csv中),我不知道如何编写一个语句来执行此操作。任何帮助表示赞赏。
答案 0 :(得分:1)
您需要按日期对单个主人的所有重新排序进行排序,使用列表比使用dict更容易。由于没有某种转换,月/日/年日期没有正确排序,我创建了一个datetime对象作为记录的第一项。现在,列表将按日期排序(如果两个记录的日期相同,则按电话号码排序),因此这只是从列表中查找,排序和删除项目的问题。
import csv
import collections
import datetime as dt
open('temp.csv', 'w').write("""Master #,Scrape,Date of Transaction
2C7E4B,6854585658,5/2/2007
2C7E4B,8283876134,5/8/2007
2C7E4B,4258586585,5/18/2007
C585ED,5554541212,5/18/2004
585868,5555551214,8/16/2012
""")
with open('temp.csv') as f:
f.next()
reader = csv.reader(f)
# map master to list of transactions
result = collections.defaultdict(list)
for row in reader:
key = row[0]
# make date sortable
sortable_date = dt.datetime.strptime(row[2], '%m/%d/%Y')
result[key].append([sortable_date, row[1], row[2]])
for value in result.values():
# discard old records
if len(value) > 1:
value.sort()
del value[0]
# or to delete all but the last one
# del value[:-1]
keys = result.keys()
keys.sort()
for key in keys:
transactions = result[key]
for transaction in transactions:
print key, transaction[1], transaction[2]
答案 1 :(得分:1)
以下是您可能想要的内容。基本思想是在相同的密钥下聚合日期,最后清除最早的日期条目。
#!/usr/bin/env python
import csv
import datetime
# this is used to parse the dates, you can change this if you change the format
DATE_FORMAT = "%m/%d/%Y"
# this is a dates comparator, to sort the dates when removing earliest date
def compare_dates(date1, date2):
d1 = datetime.datetime.strptime(date1, DATE_FORMAT)
d2 = datetime.datetime.strptime(date2, DATE_FORMAT)
return int((d1 - d2).total_seconds())
with open('res.csv', "r") as f:
f.next()
reader = csv.reader(f)
result = {}
for row in reader:
key = row[0]
telnumber = row[1]
transdate = row[2]
# if it's a new key, we will need a new lst for the aggregation
if not key in result:
result[key] = []
# thisis where we aggregate the all same-key entries
result[key].append([telnumber, transdate,])
# this function takes in a key-value from the dictionary,
# and returns the earliest entry from the value (value being a list)
def clear_list(kv):
k, v = kv
if len(v) > 1:
return {k: sorted(v, lambda x, y: compare_dates(x[1], y[1]))[1:]}
return {k: v}
# simply clears all entries we've aggregated under each key.
print map(clear_list, result.items())
# ... now write back to csv