如果只有具有共同值的数组,如何组合数组?

时间:2018-03-01 10:50:47

标签: python

对于包含不同分子的文件,我有许多对值(键合原子对)。如果两对具有共同成员,则意味着它们是同一分子的一部分。我试图在python中找到一种有效的方法,根据它们所属的分子对原子进行分组。

例如,乙烷和甲烷将是:

1,59将是碳,其余为氢

[[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]

我想获得一个列表/数组:

[[1,2,3,4,5,6,7,8],[9,10,11,12,13]]

我已经尝试过几种方法,但它们对于具有大量原子的文件实际上是无效的。应该有一个聪明的方法,但我找不到它。有什么想法吗?

谢谢, 琼

3 个答案:

答案 0 :(得分:1)

如果我理解正确,你要做的是识别图的连通分量,其中每个节点都是一个原子,每个边是一个键(因此,一个连接的组件是一个分子)。在scipy.sparse.csgraph中有一个有效的实现。

首先让我们将图形设置为稀疏矩阵:

import scipy.sparse as sps

# Input as provided
edges = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]
# Modify the input by adding, for each [x,y], also [y,x].
# Also transform it to a set and then again to a list
# to assure that we don't duplicate anything.
edges = list({(x[0],x[1]) for x in edges}.union({(x[1],x[0]) for x in edges}))
# Create it as a matrix. The weights of all edges are set to 1,
# as they don't matter anyway.
graph = sps.csr_matrix(([1]*len(edges), np.array(edges).T))

此时,只需调用scipy.sparse.csgraph.connected_components,但默认情况下输出的格式略有不同:

  

(3,数组([0,1,1,1,1,1,1,1,1,2,2,2,2,2)))

所以让我们稍微修改一下:

from scipy.sparse import csgraph
connected_components = csgraph.connected_components(graph)
result = []

for u in range(1, connected_components[0]):
    result.append(np.where(connected_components[1]==u)[0])

result
  

[array([1,2,3,4,5,6,7,8],dtype = int64),

     

数组([9,10,11,12,13],dtype = int64)]

同样请注意,在range我从1开始,因为Python标准从0开始计算,因为从1开始,这将被视为一个孤立的节点。如果原子的编号是非连续的,需要跳过孤立的节点,例如:

result = [r for r in result if len(r) > 1]

答案 1 :(得分:0)

bigArr = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]] ## Your list of pairs of values
molArr = []
for pair in bigArr:
    flag = False
    for molecule in molArr:
        if pair[0] in molecule or pair[1] in molecule: ## Add both values if any of them are in the molecules list
            molecule.append(pair[0])
            molecule.append(pair[1])
            flag = True ## The values have been added to an existing list

    if not flag: ## The values weren't in an existing list so add them both
        molArr.append(pair)

i = 0
for i in range(len(molArr)): ## Remove duplicates in one loop
    molArr[i] = list(set(molArr[i]))

答案 2 :(得分:0)

这是另一种方法:

a = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]

result = []

for sub in a:
    join = False
    for i, r in enumerate(result):
        if any([x in r for x in sub]):
            join = True
            index = i
    if join:
        result[index] += [y for y in sub if y not in result[index]]
    else:
        result.append(sub)

result
#[[1,2,3,4,5,6,7,8],[9,10,11,12,13]]