Question

我有一个包含25列的.csv文件。在该数据中，列18是People_ID，列19是捐赠日期。我已使用Linux预先对数据进行了排序，以便所有人员ID一起显示，按捐赠日期降序排序。

这是我不确定如何继续的地方。我需要找到所有具有相同People_ID和捐赠日期的行，将各种值相加，然后将一行输出到输出中。基本上，文件中的每一行都可以是不同的客户，也可以是同一客户的不同捐赠日期。使用People_ID作为键是否最好使用字典？这看起来会如何语法化？

我在想这样的事情：

with open("file.csv") as csv_file:
for row in csv.reader(csv_file, delimiter=','):
    if row[18] in data_dict:
        # something something

Answer 1

由于您已预先排序，因此可以将其组织为每个人调用一次函数，每次调用特定人员的行。

由于数据是预先排序的，我们假设人1的行在一起，然后是人2的行（或39，或其他一些数字）等。所以我们需要检测人员在字段中的时间18个变化。为此，我们使用变量last_person来跟踪我们正在处理的人。变量row_cache将为单个人收集行。

def process_person(rows):
    if len(rows)==0:
         return
    # do something with the rows for this person
    # and print the result somewhere useful

last_person = 0
row_cache = []
with open("file.csv") as csv_file:
for row in csv.reader(csv_file, delimiter=','):
    if row[18]==last_person: 
        row_cache.append(row)
    else:
        process_person(row_cache)
        row_cache = [row]
        last_person = row[18]
process_person(row_cache)

Answer 2

我建议采用面向对象的方法。

suggests installing

这定义了一个import csv class Transaction: def __init__(self, fields): self.name, self.age, self.car, self.ident = fields # whatever fields you have # keep in mind these are all strings, # so you may need to process them before analysis def calculation(self): return self.age + self.id transactions = {} with open('csv_file.csv', newline='') as f: for row in csv.reader(f): bucket = tuple(row[18:20]) if bucket in transactions: transactions[bucket].append(Transaction(row)) else: transactions[bucket] = [Transaction(row)] for bucket in transactions: print(bucket, sum(item.amount for item in bucket.values()))类，其实例包含将来自CSV文件的各个字段。然后它启动一个事务字典并查看CSV文件，将新的Transaction对象添加到新的存储桶中（如果之前已经看过给定的ID和日期）或者添加到现有存储桶中（如果之前已经看过给定的ID和日期。

然后它会浏览这个词典并对每个桶进行计算，打印桶和计算结果。

如何基于两个值比较数据（.csv）文件中的行，然后使用Python汇总数据？

2 个答案: