基因网络拓扑重叠Python

时间:2014-08-23 08:15:08

标签: python networkx

在复杂的基因网络中,我们如何找到拓扑重叠。

输入数据如下

code  code weight

3423 3455   3453
2344 2353   45
3432 3453   456
3235 4566   34532
2345 8687   356
2466 6467   3567
3423 2344   564
3455 2353   4564
3432 3423   456

节点列为col [0]和col [1],连接所花费的时间为col [2]

代码:

import networkx as nx
import numpy as np

data = np.loadtxt("USC_Test.txt")
col = []
edge_list = zip[col[0],col[1]]

G = nx.Graph()
G.add_edges_from(edge_list)
components = nx.connected_components(G)

print components

错误

edge_list = zip[col[0],col[1]]
IndexError: list index out of range

1 个答案:

答案 0 :(得分:3)

我必须承认我对拓扑重叠一词并不熟悉,所以我不得不查阅:

  

如果网络中的一对节点都强烈连接到同一组节点,则称其具有高拓扑重叠。 (Source

NetworkX似乎没有内置方法,可以让您找到具有拓扑重叠的节点对,但它可以轻松找到强连接组件。例如:

In [1]: import networkx as nx
In [2]: edge_list = [(1, 2), (2, 1), (3, 1), (1, 3), (2, 4), (1, 4), (5, 6)]
In [3]: G = nx.DiGraph()
In [4]: G.add_edges_from(edge_list)
In [5]: components = nx.strongly_connected_components(G)
In [6]: components
Out[6]: [[1, 3, 2], [4], [6], [5]]

如果您有无向图,则可以使用nx.connected_components代替。

现在你有了组件,很容易找到具有toplogical重叠的所有对的列表。例如,从components

中的列表生成所有节点对
In [7]: from itertools import combinations
In [8]: top_overlap = [list(combinations(c, 2)) for c in components if len(c) > 1]
In [9]: top_overlap = [item for sublist in top_overlap for item in sublist]
In [10]: top_overlap
Out[10]: [(1, 3), (1, 2), (3, 2)]