Question

我有一个列表理解，它作用于两个整数列表。它就像itertools.product一样，带有一个过滤器来丢弃两者中相等的元素，并对它们进行比较。

代码如下：

to_add = [(min(atom_1, atom_2), max(atom_1, atom_2))
          for atom_1 in atoms_1 for atom_2 in atoms_2
          if atom_2 != atom_1]
add_dict = coll.defaultdict(list)
for k, v in to_add:
    add_dict[k].append(v)

我写这篇文章时最明显的一点是，无需拨打min然后max。我真正想要的是min和另一个，但我无法思考如何摆脱对max的冗余调用。

我对其进行了分析并获得了以下结果，这些结果代表了10次重复（read_amber.py是总体函数调用的名称）：

     62880808 function calls (62880792 primitive calls) in 14.746 seconds

     Ordered by: internal time

       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
           19    6.786    0.357   10.688    0.563 read_amber.py:256(add_exclusions)
     16431524    1.625    0.000    1.625    0.000 {min}
     16431511    1.295    0.000    1.295    0.000 {max}
       842947    1.051    0.000    1.051    0.000 {method 'format' of 'str' objects}
       842865    1.031    0.000    1.557    0.000 {filter}
     16457861    0.838    0.000    0.838    0.000 {method 'append' of 'list' objects}
            1    0.793    0.793    3.757    3.757 read_amber.py:79(write_to)
      8414872    0.526    0.000    0.526    0.000 read_amber.py:130(<lambda>)
      1685897    0.266    0.000    0.266    0.000 {method 'write' of 'file' objects}
        97489    0.142    0.000    0.142    0.000 {sorted}
            1    0.130    0.130    0.300    0.300 read_amber.py:32(read_from)
       247198    0.127    0.000    0.155    0.000 read_amber.py:134(data_cast)
848267/848263    0.042    0.000    0.042    0.000 {len}
            1    0.038    0.038    0.038    0.038 read_amber.py:304(update_exclusion_list)
       500352    0.028    0.000    0.028    0.000 {method 'lower' of 'str' objects}

有没有办法摆脱其中一个冗余的min/max来电？还有另一种明显的方法可以加快这个片段的速度吗？

我已经尝试过使用itertools生成器，但列表理解速度更快。我还尝试了sorted和必要的演员表，但min/max比那更快。

最后，我是使用cProfile的新手。按'tottime'排序是否明智？

Answer 1

怎么样：

import collections as coll
import itertools

add_dict = coll.defaultdict(list)
for atom_1, atom_2 in itertools.product(atoms_1, atoms_2):
    if atom_1 == atom_2: continue
    (atom_min, atom_max) = (atom_1, atom_2) if atom_1 < atom_2 else (atom_2, atom_1)
    add_dict[atom_min].append(atom_max)

或者，如果额外的任务是一个问题（我几乎认为不重要）：

add_dict = coll.defaultdict(list)
for atom_1, atom_2 in itertools.product(atoms_1, atoms_2):
    if atom_1 == atom_2: continue
    if atom_1 < atom_2:
        add_dict[atom_1].append(atom_2)
    else:
        add_dict[atom_2].append(atom_1)

虽然这看起来不太可读。

编辑： timeit结果：

看起来这种方法减少了运行时间。

import collections as coll
import itertools

atoms_1 = [1,2,3,4,5,6]
atoms_2 = [2,4,6,1,2,3]

def old():
    to_add = [(min(atom_1, atom_2), max(atom_1, atom_2)) for atom_1 in atoms_1 for atom_2 in atoms_2 if atom_2 != atom_1]
    add_dict = coll.defaultdict(list)
    for k, v in to_add:
        add_dict[k].append(v)
    return add_dict

def new(): 
    add_dict = coll.defaultdict(list)
    for atom_1, atom_2 in itertools.product(atoms_1, atoms_2):
        if atom_1 == atom_2: continue
        (atom_min, atom_max) = (atom_1, atom_2) if atom_1 < atom_2 else (atom_2, atom_1)
        add_dict[atom_min].append(atom_max)    
    return add_dict

import timeit
print(timeit.timeit("old()", setup="from __main__ import old"))  # 20.76972103
print(timeit.timeit("new()", setup="from __main__ import new"))  # 10.9827100827

编辑2： timeit结果 - 更长的列表，更少的时间迭代次数

atoms_1 = [1,2,3,4,5,6] * 5
atoms_2 = [2,4,6,1,2,3] * 5

print(timeit.timeit("old()", setup="from __main__ import old", number=100000)) # 46.2878425701
print(timeit.timeit("new()", setup="from __main__ import new", number=100000)) # 21.9272824532

Python：优化列表理解，比较两个整数

1 个答案: