Question

我正在进行图形分析，实际上我是新手。我正在编写软件来从链接列表中加载图表。我加载图形的方式是花费50秒和500MB用于具有大约4200个顶点和88234个边/链接的图形。我想知道这样的数字是否正常！我的问题：加载图表有更好的方法吗？

这是我的尝试：

def read_graph(file_path):
    """
        Read a text file that has the following format: source target
    """
    edges_file_path = file_path
    graph = gt.Graph(directed=False)
    vertices_names_to_indices = {}
    with open(edges_file_path, 'r') as edges_file:
        for line in edges_file:
            line = line.rstrip()
            row = line.split(' ') 
            if len(row) != 2:
                raise Exception("There are more than two nodes in a row in the edges file!")
            source = row[0]
            target = row[1]
            sindex = None
            tindex = None

            if source in vertices_names_to_indices:
                sindex = vertices_names_to_indices[source]
            else:
                v1 = graph.add_vertex()
                sindex = int(v1)
                vertices_names_to_indices[source] = sindex

            if target in vertices_names_to_indices:
                tindex = vertices_names_to_indices[target]
            else:
                v2 = graph.add_vertex()
                tindex = int(v2)
                vertices_names_to_indices[target] = tindex

            graph.add_edge(sindex, tindex)

Answer 1

为什么要将顶点转换为整数？你为什么不使用collections.defaultdict？有了它，并简化了代码的其他部分，我得到了类似的东西：

from collections import defaultdict

def read_graph(file_path):
    """
        Read a text file that has the following format: source target
    """
    graph = gt.Graph(directed=False)
    vertices_names_to_indices = defaultdict(graph.add_vertex)
    with open(file_path, 'r') as edges_file:
        for line in edges_file:
            source, target = line.rstrip().split(' ')
            graph.add_edge(vertices_names_to_indices[source],
                           vertices_names_to_indices[target])

由于我没有你的边缘文件，我既无法测试也无法对其进行分析。

从链接列表加载图形需要大量的时间和内存

1 个答案: