Question

我最初的目标是使用Networkx进行一些结构属性分析（直径，聚类系数等）。然而，我只是想通过计算给定图形中存在多少边缘而跌跌撞撞。此图可以下载from over here (beware: 126 MB zip file)，包含1,632,803个节点和30,622,564个边。 请注意，如果您要下载此文件，请务必删除位于文件顶部的注释（包括＃）

我的机器中有8 GB的内存。我的计划（直径/聚类系数）对于这个尺寸的图表是否过于雄心勃勃？我希望不是，因为我喜欢networkx，因为它简单而且看起来很完整..但是如果它雄心勃勃，那么请你建议另一个我可以用于这份工作的图书馆吗？

import networkx as nx

graph = nx.Graph()
graph.to_directed()

def create_undirected_graph_from_file(path, graph):
    for line in open(path):
        edges = line.rstrip().split()
        graph.add_edge(edges[0], edges[1])

print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())

错误：

Traceback (most recent call last):
  File "C:/Users/USER/PycharmProjects/untitled/main.py", line 12, in <module>
    print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
  File "C:/Users/User/PycharmProjects/untitled/main.py", line 8, in create_undirected_graph_from_file
    edges = line.rstrip().split()
MemoryError

Answer 1

一个潜在的问题是字符串占用大量内存。由于所有边都是整数，因此在创建边之前将它们转换为整数可以从中受益。您将受益于内部更快的跟踪，并且内存占用更少！具体做法是：

def create_undirected_graph_from_file(path, graph):
    for line in open(path):
        a, b = line.rstrip().split()
        graph.add_edge(int(a), int(b))
    return graph

我建议您还要更改open以使用上下文并确保文件已打开：

def create_undirected_graph_from_file(path, graph):
    with open(path) as f:
        for line in f:
            a, b = line.rstrip().split()
            graph.add_edge(int(a), int(b))
    return graph

或神奇的单行：

def create_undirected_graph_from_file(path, graph):
    with open(path) as f:
        [graph.add_edge(*(int(point) for point in line.rstrip().split())) for line in f]
    return graph

还要记住一件事。 Graph.to_directed会返回一个新图表。因此，请确保将图形设置为此结果而不是丢弃结果。

使用Networkx计算图形中的边时的MemoryError

1 个答案: