我最初的目标是使用Networkx进行一些结构属性分析(直径,聚类系数等)。然而,我只是想通过计算给定图形中存在多少边缘而跌跌撞撞。此图可以下载from over here (beware: 126 MB zip file),包含1,632,803个节点和30,622,564个边。 请注意,如果您要下载此文件,请务必删除位于文件顶部的注释(包括#)
我的机器中有8 GB的内存。我的计划(直径/聚类系数)对于这个尺寸的图表是否过于雄心勃勃?我希望不是,因为我喜欢networkx,因为它简单而且看起来很完整..但是如果它雄心勃勃,那么请你建议另一个我可以用于这份工作的图书馆吗?
import networkx as nx
graph = nx.Graph()
graph.to_directed()
def create_undirected_graph_from_file(path, graph):
for line in open(path):
edges = line.rstrip().split()
graph.add_edge(edges[0], edges[1])
print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
错误:
Traceback (most recent call last):
File "C:/Users/USER/PycharmProjects/untitled/main.py", line 12, in <module>
print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
File "C:/Users/User/PycharmProjects/untitled/main.py", line 8, in create_undirected_graph_from_file
edges = line.rstrip().split()
MemoryError
答案 0 :(得分:2)
一个潜在的问题是字符串占用大量内存。由于所有边都是整数,因此在创建边之前将它们转换为整数可以从中受益。您将受益于内部更快的跟踪,并且内存占用更少!具体做法是:
def create_undirected_graph_from_file(path, graph):
for line in open(path):
a, b = line.rstrip().split()
graph.add_edge(int(a), int(b))
return graph
我建议您还要更改open
以使用上下文并确保文件已打开:
def create_undirected_graph_from_file(path, graph):
with open(path) as f:
for line in f:
a, b = line.rstrip().split()
graph.add_edge(int(a), int(b))
return graph
或神奇的单行:
def create_undirected_graph_from_file(path, graph):
with open(path) as f:
[graph.add_edge(*(int(point) for point in line.rstrip().split())) for line in f]
return graph
还要记住一件事。 Graph.to_directed
会返回一个新图表。因此,请确保将图形设置为此结果而不是丢弃结果。