我已经定义了一个名为class
的Python Edge
,如下所示:
class Edge:
def __init__(self):
self.node1 = 0
self.node2 = 0
self.weight = 0
现在,我必须使用以下方法创建大约10 ^ 6至10 ^ 7个Edge实例:
edges= []
for (i,j,w) in ijw:
edge = Edge()
edge.node1 = i
edge.node2 = j
edge.weight = w
edges.append(edge)
我在台式机上花了大约2秒钟。有什么更快的方法吗?
答案 0 :(得分:8)
您无法使其更快,但我当然会使用__slots__
来节省内存分配。还可以在创建实例时传递属性值:
class Edge:
__slots__ = ('node1', 'node2', 'weight')
def __init__(self, node1=0, node2=0, weight=0):
self.node1 = node1
self.node2 = node2
self.weight = weight
使用更新后的__init__
,您可以使用列表理解:
edges = [Edge(*args) for args in ijw]
这些可以一起节省创建对象的大量时间,大约将所需时间减半。
比较创建100万个对象;设置:
>>> from random import randrange
>>> ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 6)]
>>> class OrigEdge:
... def __init__(self):
... self.node1 = 0
... self.node2 = 0
... self.weight = 0
...
>>> origloop = '''\
... edges= []
... for (i,j,w) in ijw:
... edge = Edge()
... edge.node1 = i
... edge.node2 = j
... edge.weight = w
... edges.append(edge)
... '''
>>> class SlotsEdge:
... __slots__ = ('node1', 'node2', 'weight')
... def __init__(self, node1=0, node2=0, weight=0):
... self.node1 = node1
... self.node2 = node2
... self.weight = weight
...
>>> listcomploop = '''[Edge(*args) for args in ijw]'''
和时间:
>>> from timeit import Timer
>>> count, total = Timer(origloop, 'from __main__ import OrigEdge as Edge, ijw').autorange()
>>> (total / count) * 1000 # milliseconds
722.1121070033405
>>> count, total = Timer(listcomploop, 'from __main__ import SlotsEdge as Edge, ijw').autorange()
>>> (total / count) * 1000 # milliseconds
386.6706900007557
那快将近2倍。
将随机输入列表增加到10 ^ 7项,时间差保持不变:
>>> ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 7)]
>>> count, total = Timer(origloop, 'from __main__ import OrigEdge as Edge, ijw').autorange()
>>> (total / count)
7.183759553998243
>>> count, total = Timer(listcomploop, 'from __main__ import SlotsEdge as Edge, ijw').autorange()
>>> (total / count)
3.8709938440006226
答案 1 :(得分:1)
另一种选择是跳过Edge
类,并通过表或邻接矩阵实现边。
例如
A = create_adjacency_graph(ijw) # Implement to return a IxJ (sparse?) matrix of weights
edge_a_weight = A[3, 56]
edge_b_weight = A[670, 1023]
# etc...
尽管这样做确实消除了一些灵活性,但是创建和使用时都应该非常快。
答案 2 :(得分:0)
还有另外一种使用recordclass library的内存节省方法:
from recordclass import dataobject
from random import randrange
import sys
ijw = [(randrange(100), randrange(100), randrange(1000)) for _ in range(10 ** 7)]
class EdgeDO(dataobject):
__fields__ = 'node1', 'node2', 'weight'
class EdgeSlots:
__slots__ = 'node1', 'node2', 'weight'
def __init__(self, node1, node2, weight):
self.node1 = node1
self.node2 = node2
self.weight = weight
def list_size(lst):
return sum(sys.getsizeof(o) for o in lst)
%time list_do = [EdgeDO(n1, n2, w) for n1, n2, w in ijw]
%time list_slots = [EdgeSlots(n1, n2, w) for n1, n2, w in ijw]
print('size (dataobject):', list_size(list_do))
print('size (__slots__): ', list_size(list_slots))
有输出:
CPU times: user 2.23 s, sys: 20 ms, total: 2.25 s
Wall time: 2.25 s
CPU times: user 6.79 s, sys: 84.1 ms, total: 6.87 s
Wall time: 6.87 s
size (dataobject): 400000000
size (__slots__): 640000000