Question

我有2个列表（“IDS”和“付费”）。 IDS的len是50000，而len的是650000 IDS是IDS的列表，如[1,2,3,4,5,6 ...]，PAY列表是IDS所有付款的列表，如[[1,50]， [1,100]，[1,60]，[2,50]，[2,80]，[2,50]，......]

要知道每个ID总共支付了多少，我正在另一个for循环中进行for循环，如下所示：

for x in IDS:
    total = 0
    for i in xrange(0,len(Pay)):
        if x == Pay[i][0]:
            total += Pay[i][1]
    print x + str(total)

但这需要很长时间才能完成！我试图将Pay分成10件，但仍然需要太长时间。任何人都知道如何改进这项操作？

谢谢！

Answer 1

您可以使用collections.Counter：

>>> from collections import Counter
>>> pay = [ [1,50], [1,100], [1,60], [2,50], [2,80], [2,50]]
>>> c = Counter()
>>> for idx, amt in pay:
    c[idx] += amt
...     
>>> c
Counter({1: 210, 2: 180})

Answer 2

好的，事实是你有2个很长的名单。而不是讨论使用什么库，更好的算法呢？

ID应该自然包含唯一的整数（我的猜测），而Pay是（id，payment）的元组。

现在考虑一下你的名单来自哪里。有两种可能性：

从文件中读取
从某些数据库，如MySQL

如果是选项1，则应该执行以下操作：

from collections import defaultdict
totals = defaultdict(someObj_factory)
[totals[int(line.split[0])].accumulate(someObj_factory(line.split()[1]))
 for line in paymentFile]

首先，您不需要将ID作为独立列表，因为您将它们保存在Pay中。

其次，它节省了阅读时间。

第三，对于脚本语言，列表理解节省了解释时间。

第四，这很强大，因为你可以添加任何你想要的对象，比如日期或元组。

如果是选项2，请在数据库中进行计数-.-

另一种选择是将这些插入到数据库中，并在那里进行计数。 MySQL等是为这种任务而设计的。你会惊讶于它的效率。更多信息：http://mysql-python.sourceforge.net/

Answer 3

如果collections.Counter对您不起作用 - 比如说您使用的是其他Python版本，则将您的支付列表转换为字典会产生相同的效果。

totals = {}
for id, amount in pay:
   totals[id] = totals.setdefault(id, 0) + amount

与支付日期[1,50,2013-09-01]一样，我必须总结只有大于'2013-01-01'日期的值？

然后这样做：

import datetime

base_date = datetime.datetime.strptime('2013-01-01', '%Y-%m-%d').date()

totals = {}
for idx, amount, pay_date in pay:
   if datetime.datetime.strptime(pay_date, '%Y-%m-%d').date() > base_date:
       totals[idx] = totals.setdefault(id, 0) + amount

Answer 4

您只需要迭代Pay一次（而不是50000次！）。您可以通过散列来大幅加快计算速度：

totals = dict(map(lambda id: (id,0), IDS))

for L in Pay:
    if L[0] in totals:
        totals[L[0]] = totals[L[0]] + L[1]


for (id, total) in totals.iteritems():
    print "id: %s, total: %d"%(id, total)

内心很大。我怎样才能减少时间？

4 个答案: