我正在尝试使用Python 3中的networkx包生成大型无比例图形。 这些是我的软件版本:
python --version
Python 3.7.3
pip --version
pip 19.0.3 from /home/user/bin/python3/lib/python3.7/site-packages/pip (python 3.7)
pip show networkx
Name: networkx
Version: 2.3
Summary: Python package for creating and manipulating graphs and networks
更具体地说,我需要生成顶点数分别为100K,1M和10M的无标度图。 我的代码非常简洁:
n = 1000000 # 100K, then 1M, then 10M...
G = nx.scale_free_graph(n)
file_stamp = str(datetime.datetime.now()).split('.')[0].replace(' ', '_').replace(':', '-')
target_file_name = str(args.n) + "V_" + str(G.number_of_edges()) + "E_" + file_stamp + ".tsv"
target_file_path = os.path.join(args.out_dir, target_file_name)
print("> Target edge file:\t\t{}".format(target_file_path))
with open(target_file_path, 'wb') as f:
nx.write_edgelist(G, f, data = False)
对于n = 100000
(十万),执行过程花费了几秒钟。
但是,对于n = 1000000
(一百万)或n = 10000000
(一千万)来说,脚本已经运行了几天。
我注意到内存使用量一直在缓慢增长。
我希望这些图形比进程当前所占用的内存更多,这将暗示生成器逻辑是罪魁祸首。 由于时间的流逝,我开始认为生成过程很慢。
我去检查了networkx.scale_free_graph函数的来源:
@py_random_state(7)
def scale_free_graph(n, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2,
delta_out=0, create_using=None, seed=None):
"""Returns a scale-free directed graph.
Parameters
----------
n : integer
Number of nodes in graph
alpha : float
Probability for adding a new node connected to an existing node
chosen randomly according to the in-degree distribution.
beta : float
Probability for adding an edge between two existing nodes.
One existing node is chosen randomly according the in-degree
distribution and the other chosen randomly according to the out-degree
distribution.
gamma : float
Probability for adding a new node connected to an existing node
chosen randomly according to the out-degree distribution.
delta_in : float
Bias for choosing nodes from in-degree distribution.
delta_out : float
Bias for choosing nodes from out-degree distribution.
create_using : NetworkX graph constructor, optional
The default is a MultiDiGraph 3-cycle.
If a graph instance, use it without clearing first.
If a graph constructor, call it to construct an empty graph.
seed : integer, random_state, or None (default)
Indicator of random number generation state.
See :ref:`Randomness<randomness>`.
Examples
--------
Create a scale-free graph on one hundred nodes::
>>> G = nx.scale_free_graph(100)
Notes
-----
The sum of `alpha`, `beta`, and `gamma` must be 1.
References
----------
.. [1] B. Bollobás, C. Borgs, J. Chayes, and O. Riordan,
Directed scale-free graphs,
Proceedings of the fourteenth annual ACM-SIAM Symposium on
Discrete Algorithms, 132--139, 2003. """
def _choose_node(G, distribution, delta, psum):
cumsum = 0.0
# normalization
r = seed.random()
for n, d in distribution:
cumsum += (d + delta) / psum
if r < cumsum:
break
return n
if create_using is None or not hasattr(create_using, '_adj'):
# start with 3-cycle
G = nx.empty_graph(3, create_using, default=nx.MultiDiGraph)
G.add_edges_from([(0, 1), (1, 2), (2, 0)])
else:
G = create_using
if not (G.is_directed() and G.is_multigraph()):
raise nx.NetworkXError("MultiDiGraph required in create_using")
if alpha <= 0:
raise ValueError('alpha must be > 0.')
if beta <= 0:
raise ValueError('beta must be > 0.')
if gamma <= 0:
raise ValueError('gamma must be > 0.')
if abs(alpha + beta + gamma - 1.0) >= 1e-9:
raise ValueError('alpha+beta+gamma must equal 1.')
number_of_edges = G.number_of_edges()
while len(G) < n:
psum_in = number_of_edges + delta_in * len(G)
psum_out = number_of_edges + delta_out * len(G)
r = seed.random()
# random choice in alpha,beta,gamma ranges
if r < alpha:
# alpha
# add new node v
v = len(G)
# choose w according to in-degree and delta_in
w = _choose_node(G, G.in_degree(), delta_in, psum_in)
elif r < alpha + beta:
# beta
# choose v according to out-degree and delta_out
v = _choose_node(G, G.out_degree(), delta_out, psum_out)
# choose w according to in-degree and delta_in
w = _choose_node(G, G.in_degree(), delta_in, psum_in)
else:
# gamma
# choose v according to out-degree and delta_out
v = _choose_node(G, G.out_degree(), delta_out, psum_out)
# add new node w
w = len(G)
G.add_edge(v, w)
number_of_edges += 1
return G
此代码的主循环将迭代等于n
个顶点数量的时间。
无需进一步分析,在主循环内,_choose_node
每次迭代至少调用一次,最多两次。
在该函数内部,存在另一个循环,遍历输入/输出的程度(分布)。
我认为n
增加时,_choose_node
中的计算时间也增加。
在networkx中是否可以更快地实现这种无标度生成器? 还是另一个库中的一个函数(没有语言限制),该函数会生成具有与此语义相同的无标度图?
答案 0 :(得分:0)
也许有一些方法可以更有效地做到这一点;但是,您正在处理组合增长-这是超指数的。 https://medium.com/@TorBair/exponential-growth-isn-t-cool-combinatorial-growth-is-85a0b1fdb6a5
挑战在于,以这种方式在(n)个边上进行计算的增长速度要比指数快。您可能会使用更有效的算法,但由于您要处理原始数学问题,它们不会为您带来太大的进步。
答案 1 :(得分:0)
减少时间的唯一方法是通过
编辑生成器代码将r
乘以psum
(在循环之前)一次,而不是将cumsum
除以psum
n次。那应该节省n次不必要的划分。
用G.in_degree()
(在函数循环内)替换G.in_degree(n)
(在函数调用外)。这样,循环
for n, d in distribution:
可以是类似
for n in G:
d = G.in_degree(n)
这样可以省去事先计算图中所有 all 个节点的 all in_degrees / out_degrees分布的麻烦,希望节点的随机选择会在某个时候停止尽早获得性能提升。