我正在进行图形分析,实际上我是新手。我正在编写软件来从链接列表中加载图表。我加载图形的方式是花费50秒和500MB用于具有大约4200个顶点和88234个边/链接的图形。我想知道这样的数字是否正常!我的问题:加载图表有更好的方法吗?
这是我的尝试:
def read_graph(file_path):
"""
Read a text file that has the following format: source target
"""
edges_file_path = file_path
graph = gt.Graph(directed=False)
vertices_names_to_indices = {}
with open(edges_file_path, 'r') as edges_file:
for line in edges_file:
line = line.rstrip()
row = line.split(' ')
if len(row) != 2:
raise Exception("There are more than two nodes in a row in the edges file!")
source = row[0]
target = row[1]
sindex = None
tindex = None
if source in vertices_names_to_indices:
sindex = vertices_names_to_indices[source]
else:
v1 = graph.add_vertex()
sindex = int(v1)
vertices_names_to_indices[source] = sindex
if target in vertices_names_to_indices:
tindex = vertices_names_to_indices[target]
else:
v2 = graph.add_vertex()
tindex = int(v2)
vertices_names_to_indices[target] = tindex
graph.add_edge(sindex, tindex)
答案 0 :(得分:1)
为什么要将顶点转换为整数?你为什么不使用collections.defaultdict
?有了它,并简化了代码的其他部分,我得到了类似的东西:
from collections import defaultdict
def read_graph(file_path):
"""
Read a text file that has the following format: source target
"""
graph = gt.Graph(directed=False)
vertices_names_to_indices = defaultdict(graph.add_vertex)
with open(file_path, 'r') as edges_file:
for line in edges_file:
source, target = line.rstrip().split(' ')
graph.add_edge(vertices_names_to_indices[source],
vertices_names_to_indices[target])
由于我没有你的边缘文件,我既无法测试也无法对其进行分析。