Question

我一直在尝试为项目构建图表，并且在填充更多信息之后我一直在尝试识别新添加的边缘。

例如下面你可以看到它的第一次和第二次迭代：

----------------------一般信息图H --------------------- --------

Total number of Nodes in Graph:  2364
Total number of Edges:  3151

----------------------一般信息图G --------------------- --------

Total number of Nodes in Graph:  6035
Total number of Edges:  11245

我遇到的问题是当我尝试使用代码识别新添加的边缘时：

counter = 0
edges_all = list(G.edges_iter(data=True)) 
edges_before = list(H.edges_iter(data=True)) 
print "How many edges in old graph: ", len(edges_before)
print "How many edges in new graph: ", len(edges_all)
edge_not_found = []
for edge in edges_all:
    if edge in edges_before:
        counter += 1
    else:
        edge_not_found.append(edge)
print "Edges found: ", counter
print "Not found: ", len(edge_not_found)

我得到了这些结果：

How many edges in old graph:  3151
How many edges in new graph:  11245
Edges found:  1601
Not found:  9644

我无法理解为什么我找到了1601而不是11245-3151 = 8094

有什么想法吗？

谢谢！

Answer 1

TL / DR：对你所看到的内容有一个简单的解释，如果你到底，编写代码的方式要短得多（在此过程中会有很多解释）。

首先请注意，Edges found似乎是H和G中的边数。所以它应该只有3151而不是8094.8094应该是Not found。请注意，找到的边数1601大约是您预期的一半。这是有道理的，因为：

我相信您遇到的问题是，当networkx列出边缘时，边缘可能会在(a,b)中显示为edges_before。但是在edges_after中，它可能会在列表中显示为(b,a)。

所以(b,a)不在edges_before。它将无法通过您的测试。假设边缘订单在列出H和G的时间之间没有相关性，您可能会发现其中约有一半通过。您可以执行其他测试，以查看(b,a)是H的边缘。这是H.has_edge(b,a)

直接的改进：

for edge in edges_all:
    if H.has_edge(edge[0],edge[1]):
        counter += 1
    else:
        edge_not_found.append(edge)

这使您甚至可以避免定义edges_before。

您还可以避免通过更好的改进来定义edges_all：

for edge in G.edges_iter(data=True):
    if H.has_edge(edge[0],edge[1]):
        etc

注意：我已将其写为H.has_edge(edge[0],edge[1])以明确发生了什么。更复杂的写作方式是H.has_edge(*edge)。 *edge符号unpacks the tuple。

最后，使用list comprehension可以更好地获取edge_not_found：

edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]

这会创建一个由edge组成的列表，该列表位于G但不在H中。

将所有这些放在一起（并使用.size()命令计算网络边缘数），我们得出一个更清晰的版本：

print "How many edges in old graph: ", H.size()
print "How many edges in new graph: ", G.size()
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
print "Not found: ", len(edge_not_found)
print "Edges found: ", G.size()-len(edge_not_found)

Networkx Python边缘比较

1 个答案: