Question

我正在尝试在Python中完成以下逻辑操作，但遇到内存和时间问题。既然，我是python的新手，那么如何以及在哪里优化问题的指导将不胜感激！（我确实理解以下问题有点抽象）

import networkx as nx 
    dic_score = {}
    G = nx.watts_strogatz_graph(10000,10,.01) # Generate 2 graphs with 10,000 nodes using Networkx
    H = nx.watts_strogatz_graph(10000,10,.01)
    for Gnodes in G.nodes()
        for Hnodes in H.nodes ()  # i.e. For all the pair of nodes in both the graphs
           score = SomeOperation on (Gnodes,Hnodes)  # Calculate a metric 
           dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ]) # Store the metric in the form a Key: value, where value become a list of lists, pair in a dictionary

然后根据此处提到的标准对生成的字典中的列表进行排序 sorting_criterion

我的问题/疑问是：

1）有没有比使用for循环进行迭代更好的方法？

2）应该采用哪种最优化（最快）的方法来解决上述问题？我应该考虑使用其他数据结构而不是字典吗？或者可能是文件操作？

3）因为我需要对这个字典中的列表进行排序，这个字典有10,000个键，每个键对应一个10,000个值的列表，所以内存需求变得非常快，而且我用完了。

3）有没有办法在字典本身的计算中整合排序过程，即避免单独循环进行排序？

任何输入都将不胜感激！谢谢！

Answer 1

1）您可以使用itertools模块中的一个函数。我只想提一下，你可以阅读手册或致电：

from itertools import product
help(product)

以下是一个例子：

for item1, item2 in product(list1, list2):
    pass

2）如果结果太大而无法放入内存，请尝试将它们保存在某处。您可以将其输出到CSV文件中，例如：

with open('result.csv') as outfile:
   writer = csv.writer(outfile, dialect='excel')
   for ...
       writer.write(...)

这将释放你的记忆。

3）我认为最好在之后对结果数据进行排序（因为sort函数非常快），而不是使问题复杂化并对数据进行快速排序。

您可以使用NumPy arroy / matrix操作（求和，产品，甚至将函数映射到每个矩阵行）。这些速度非常快，有时过滤数据的成本高于计算所有内容。

如果您的应用仍然非常慢，请尝试对其进行分析，以确切了解哪些操作很慢或执行过多次：

from cProfile import Profile
p = Profile()

p.runctx('my_function(args)', {'my_function': my_function, 'args': my_data}, {})
p.print_stats()

你会看到表格：

      2706 function calls (2004 primitive calls) in 4.504 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2    0.006    0.003    0.953    0.477 pobject.py:75(save_objects)
  43/3    0.533    0.012    0.749    0.250 pobject.py:99(evaluate)
...

Answer 2

使用返回列表的函数时，请检查返回迭代器的函数。

这将提高内存使用率。

在您的情况下，nx.nodes会返回完整列表。请参阅：nodes

使用nodes_iter ，因为它返回一个迭代器。这应该确保在for循环中的节点上进行迭代时，您没有内存中的完整节点列表。

请参阅：nodes_iter

一些改进：

import networkx as nx 
    dic_score = {}
    G = nx.watts_strogatz_graph(10000,10,.01) 
    H = nx.watts_strogatz_graph(10000,10,.01)
    for Gnodes in G.nodes_iter() ----------------> changed from G.nodes()
        for Hnodes in H.nodes_iter()  -----------> changed from H.nodes()
           score = SomeOperation on (Gnodes,Hnodes) 
           dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])

你也可以使用另一个习惯用法，因为现在你有两个迭代器：使用itertools.products

product(A, B) returns the same as ((x,y) for x in A for y in B).

Answer 3

其他人提到itertools.product。这很好，但在你的情况下，还有另一种可能性：内部循环的生成器表达式和sorted函数。（代码未经测试，当然。）

import networkx as nx
from operator import itemgetter 
dic_score = {}
G = nx.watts_strogatz_graph(10000,10,.01) # Generate 2 graphs with 10,000 nodes using Networkx
H = nx.watts_strogatz_graph(10000,10,.01)
for Gnodes in G.nodes():
    dic_score[Gnodes] = sorted([Hnodes, score(Gnodes, Hnodes), -1] for Hnodes in H.nodes(), key=operator.itemgetter(1)) # sort on score

内部循环由生成器表达式替换。它也是即时排序的（假设您要对score上的每个内部列表进行排序）。您可以轻松地将每个内部列表写入文件，而不是存储在字典中，这有助于记忆。

如何在python中优化以下算法的内存和时间使用

3 个答案: