使用Networkx计算图形中的边时的MemoryError

时间:2017-06-10 01:31:54

标签: python graph networkx

我最初的目标是使用Networkx进行一些结构属性分析(直径,聚类系数等)。然而,我只是想通过计算给定图形中存在多少边缘而跌跌撞撞。此图可以下载from over here (beware: 126 MB zip file),包含1,632,803个节点和30,622,564个边。 请注意,如果您要下载此文件,请务必删除位于文件顶部的注释(包括#)

我的机器中有8 GB的内存。我的计划(直径/聚类系数)对于这个尺寸的图表是否过于雄心勃勃?我希望不是,因为我喜欢networkx,因为它简单而且看起来很完整..但是如果它雄心勃勃,那么请你建议另一个我可以用于这份工作的图书馆吗?

import networkx as nx

graph = nx.Graph()
graph.to_directed()

def create_undirected_graph_from_file(path, graph):
    for line in open(path):
        edges = line.rstrip().split()
        graph.add_edge(edges[0], edges[1])

print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())

错误:

Traceback (most recent call last):
  File "C:/Users/USER/PycharmProjects/untitled/main.py", line 12, in <module>
    print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
  File "C:/Users/User/PycharmProjects/untitled/main.py", line 8, in create_undirected_graph_from_file
    edges = line.rstrip().split()
MemoryError

1 个答案:

答案 0 :(得分:2)

一个潜在的问题是字符串占用大量内存。由于所有边都是整数,因此在创建边之前将它们转换为整数可以从中受益。您将受益于内部更快的跟踪,并且内存占用更少!具体做法是:

def create_undirected_graph_from_file(path, graph):
    for line in open(path):
        a, b = line.rstrip().split()
        graph.add_edge(int(a), int(b))
    return graph

我建议您还要更改open以使用上下文并确保文件已打开:

def create_undirected_graph_from_file(path, graph):
    with open(path) as f:
        for line in f:
            a, b = line.rstrip().split()
            graph.add_edge(int(a), int(b))
    return graph

或神奇的单行:

def create_undirected_graph_from_file(path, graph):
    with open(path) as f:
        [graph.add_edge(*(int(point) for point in line.rstrip().split())) for line in f]
    return graph

还要记住一件事。 Graph.to_directed会返回一个新图表。因此,请确保将图形设置为此结果而不是丢弃结果。