从链接列表加载图形需要大量的时间和内存

时间:2014-03-22 09:36:01

标签: python algorithm graph

我正在进行图形分析,实际上我是新手。我正在编写软件来从链接列表中加载图表。我加载图形的方式是花费50秒和500MB用于具有大约4200个顶点和88234个边/链接的图形。我想知道这样的数字是否正常!我的问题:加载图表有更好的方法吗?

这是我的尝试:

def read_graph(file_path):
    """
        Read a text file that has the following format: source target
    """
    edges_file_path = file_path
    graph = gt.Graph(directed=False)
    vertices_names_to_indices = {}
    with open(edges_file_path, 'r') as edges_file:
        for line in edges_file:
            line = line.rstrip()
            row = line.split(' ') 
            if len(row) != 2:
                raise Exception("There are more than two nodes in a row in the edges file!")
            source = row[0]
            target = row[1]
            sindex = None
            tindex = None

            if source in vertices_names_to_indices:
                sindex = vertices_names_to_indices[source]
            else:
                v1 = graph.add_vertex()
                sindex = int(v1)
                vertices_names_to_indices[source] = sindex

            if target in vertices_names_to_indices:
                tindex = vertices_names_to_indices[target]
            else:
                v2 = graph.add_vertex()
                tindex = int(v2)
                vertices_names_to_indices[target] = tindex

            graph.add_edge(sindex, tindex)

1 个答案:

答案 0 :(得分:1)

为什么要将顶点转换为整数?你为什么不使用collections.defaultdict?有了它,并简化了代码的其他部分,我得到了类似的东西:

from collections import defaultdict

def read_graph(file_path):
    """
        Read a text file that has the following format: source target
    """
    graph = gt.Graph(directed=False)
    vertices_names_to_indices = defaultdict(graph.add_vertex)
    with open(file_path, 'r') as edges_file:
        for line in edges_file:
            source, target = line.rstrip().split(' ')
            graph.add_edge(vertices_names_to_indices[source],
                           vertices_names_to_indices[target])

由于我没有你的边缘文件,我既无法测试也无法对其进行分析。