已杀:9从节点列表和链接列表创建图表

时间:2016-04-27 19:41:45

标签: python igraph

我正在使用iGraphnodes列表和links列表构建网络。有200,000个节点和450,000个链接。对于每个node,存在一些关联的元数据,并且每个link都相同。

节点列表如下所示:

[{u'toid': u'osgb4000000031043205', u'index': 1, u'point': [508180.748, 195333.973]}, {u'toid': u'osgb4000000031043206', u'index': 2, u'point': [508163.122, 195316.627]}, {u'toid': u'osgb4000000031043207', u'index': 3, u'point': [508172.075, 195325.719]}, {u'toid': u'osgb4000000031043208', u'index': 4, u'point': [508513, 196023]}]

links列表如下所示:

[{u'index': 1, u'term': u'Private Road - Restricted Access', u'nature': u'Single Carriageway', u'negativeNode': u'osgb4000000023183407', u'toid': u'osgb4000000023296573', u'polyline': [492019.481, 156567.076, 492028, 156567, 492041.667, 156570.536, 492063.65, 156578.067, 492126.5, 156602], u'positiveNode': u'osgb4000000023183409'}, {u'index': 2, u'term': u'Private Road - Restricted Access', u'nature': u'Single Carriageway', u'negativeNode': u'osgb4000000023763485', u'toid': u'osgb4000000023296574', u'polyline': [492144.493, 156762.059, 492149.35, 156750, 492195.75, 156630], u'positiveNode': u'osgb4000000023183408'}, {u'index': 3, u'term': u'Private Road - Restricted Access', u'nature': u'Single Carriageway', u'negativeNode': u'osgb4000000023183650', u'toid': u'osgb4000000023296638', u'polyline': [492835.25, 156873.5, 493000, 156923, 493018.061, 156927.938], u'positiveNode': u'osgb4000000023183652'}, {u'index': 4, u'term': u'Local Street', u'nature': u'Single Carriageway', u'negativeNode': u'osgb4000000023181163', u'toid': u'osgb4000000023388466', u'polyline': [498136.506, 149148.313, 498123.784, 149143.969, 498119.223, 149143.411, 498116.43, 149143.318, 498113.638, 149145.179], u'positiveNode': u'osgb4000000023806248'}]

我尝试构建图表:

g = Graph()

# Add nodes (and associated data)
for node in nodes:
    g.add_vertices(node['toid'])
# Add links (and associated data)
for link in links:
    g.add_edges([(link['negativeNode'],link['positiveNode'])])

链接文件包含少量在节点列表中找不到negativeNode或positiveNode的情况。因此,iGraph会抛出以下错误:

ValueError: no such vertex: u'osgb4000000019779815'

我尝试在链接文件中添加nodes列表中不存在的nodes

for node in nodes:
    for link in links:
        if link['negativeNode'] not in node['toid']:
            missing_dict = {
            "toid": link['negativeNode']
            }
            nodes.append(missing_dict)
        if link['positiveNode'] not in node['toid']:
            missing_dict = {
            "toid": link['negativeNode']
            }
            nodes.append(missing_dict)

然而,这导致以下错误:

Killed: 9

我认为这个过程使用了太多内存。我该如何纠正这个问题?

1 个答案:

答案 0 :(得分:1)

首先,您的第二次尝试会尝试多次添加相同的链接;在最坏的情况下,links向量中的最后一个链接可能会添加nodes向量中的节点的次数。所以,这种方法是行不通的。

其次,当逐个添加节点或边时,igraph的效率不高(由于它在每个节点添加或删除后执行的索引操作)。最好将它们添加到“批次”中,即准备多个节点或边缘以添加,然后一次性调用add_vertices()add_edges()即可添加它们。

第三,Graph.DictList()方法是为了您的目的而明确设计的:它需要两个字典列表,一个用于节点,一个用于边缘,然后构造一个图形,只要你告诉它字典的哪些成员存储顶点名称和边缘的端点:

g = Graph.DictList(vertices=nodes, edges=links, vertex_name_attr="toid", edge_foreign_keys=("positiveNode", "negativeNode")

在调用Graph.DictList()之前需要确保的是,边缘列表中出现的所有节点都可以在节点列表中找到:

all_node_ids = set(edge["positiveNode"] for edge in links) | set(edge["negativeNode"] for edge in links)
known_node_ids = set(node["toid"] for node in nodes)
for node in all_node_ids - known_node_ids:
    nodes.append({u'toid': node})