Question

我有以下示例CSV，名为results1111.csv：

Master #,Scrape,Date of Transaction
2C7E4B,6854585658,5/2/2007
2C7E4B,8283876134,5/8/2007
2C7E4B,4258586585,5/18/2007
C585ED,5554541212,5/18/2004
585868,5555551214,8/16/2012

我有以下代码打开CSV，然后将数据放入多个词典：

with open('c:\\results1111.csv', "r") as f:
    f.next()
    reader = csv.reader(f)
    result = {}
    for row in reader:
        key = row[0]
        result[key] = row[1:]
        values = row[1:]
        telnumber = row[1]
        transdate = row[2]
#print key
#print values
#print telnumber
#print transdate
#print result

        d = {}
        d.setdefault(key, []).append(values)
        print d

上述代码的输出是：

{'2C7E4B': [['6854585658', '5/2/2007']]}
{'2C7E4B': [['8283876134', '5/8/2007']]}
{'2C7E4B': [['4258586585', '5/18/2007']]}
{'C585ED': [['5554541212', '5/18/2004']]}
{'585868': [['5555551214', '8/16/2012']]}

我想在字典中搜索相同密钥与其绑定多个电话号码的任何实例，例如上面输出中的前三个条目。当发生这种情况时，我想删除最早的日期字典。然后，我想将所有剩余的词典输出回CSV。输出应如下所示：

2C7E4B,8283876134,5/8/2007
2C7E4B,4258586585,5/18/2007
C585ED,5554541212,5/18/2004
585868,5555551214,8/16/2012

由于有数千个密钥（在实际输入csv中），我不知道如何编写一个语句来执行此操作。任何帮助表示赞赏。

Answer 1

您需要按日期对单个主人的所有重新排序进行排序，使用列表比使用dict更容易。由于没有某种转换，月/日/年日期没有正确排序，我创建了一个datetime对象作为记录的第一项。现在，列表将按日期排序（如果两个记录的日期相同，则按电话号码排序），因此这只是从列表中查找，排序和删除项目的问题。

import csv
import collections
import datetime as dt

open('temp.csv', 'w').write("""Master #,Scrape,Date of Transaction
2C7E4B,6854585658,5/2/2007
2C7E4B,8283876134,5/8/2007
2C7E4B,4258586585,5/18/2007
C585ED,5554541212,5/18/2004
585868,5555551214,8/16/2012
""")

with open('temp.csv') as f:
    f.next()
    reader = csv.reader(f)
    # map master to list of transactions
    result = collections.defaultdict(list)
    for row in reader:
        key = row[0]
        # make date sortable
        sortable_date = dt.datetime.strptime(row[2], '%m/%d/%Y')
        result[key].append([sortable_date, row[1], row[2]])

for value in result.values():
    # discard old records
    if len(value) > 1:
        value.sort()
        del value[0]
        # or to delete all but the last one
        # del value[:-1]

keys = result.keys()
keys.sort()

for key in keys:
    transactions = result[key]
    for transaction in transactions:
        print key, transaction[1], transaction[2]

Answer 2

以下是您可能想要的内容。基本思想是在相同的密钥下聚合日期，最后清除最早的日期条目。

#!/usr/bin/env python
import csv
import datetime

# this is used to parse the dates, you can change this if you change the format
DATE_FORMAT = "%m/%d/%Y"

# this is a dates comparator, to sort the dates when removing earliest date
def compare_dates(date1, date2):
    d1 = datetime.datetime.strptime(date1, DATE_FORMAT)
    d2 = datetime.datetime.strptime(date2, DATE_FORMAT)
    return int((d1 - d2).total_seconds())

with open('res.csv', "r") as f:
    f.next()
    reader = csv.reader(f)
    result = {}

    for row in reader:
        key = row[0]
        telnumber = row[1]
        transdate = row[2]
        # if it's a new key, we will need a new lst for the aggregation
        if not key in result:
            result[key] = []
        # thisis where we aggregate the all same-key entries
        result[key].append([telnumber, transdate,])

# this function takes in a key-value from the dictionary,
# and returns the earliest entry from the value (value being a list)
def clear_list(kv):
    k, v = kv
    if len(v) > 1:
        return {k: sorted(v, lambda x, y: compare_dates(x[1], y[1]))[1:]}
    return {k: v}

# simply clears all entries we've aggregated under each key.
print map(clear_list, result.items())

# ... now write back to csv

Python字典多键，搜索功能

2 个答案: