Question

对于包含不同分子的文件，我有许多对值（键合原子对）。如果两对具有共同成员，则意味着它们是同一分子的一部分。我试图在python中找到一种有效的方法，根据它们所属的分子对原子进行分组。

例如，乙烷和甲烷将是：

1,5和9将是碳，其余为氢

[[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]

我想获得一个列表/数组：

[[1,2,3,4,5,6,7,8],[9,10,11,12,13]]

我已经尝试过几种方法，但它们对于具有大量原子的文件实际上是无效的。应该有一个聪明的方法，但我找不到它。有什么想法吗？

谢谢，琼

Answer 1

如果我理解正确，你要做的是识别图的连通分量，其中每个节点都是一个原子，每个边是一个键（因此，一个连接的组件是一个分子）。在scipy.sparse.csgraph中有一个有效的实现。

首先让我们将图形设置为稀疏矩阵：

import scipy.sparse as sps

# Input as provided
edges = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]
# Modify the input by adding, for each [x,y], also [y,x].
# Also transform it to a set and then again to a list
# to assure that we don't duplicate anything.
edges = list({(x[0],x[1]) for x in edges}.union({(x[1],x[0]) for x in edges}))
# Create it as a matrix. The weights of all edges are set to 1,
# as they don't matter anyway.
graph = sps.csr_matrix(([1]*len(edges), np.array(edges).T))

此时，只需调用scipy.sparse.csgraph.connected_components，但默认情况下输出的格式略有不同：

（3，数组（[0,1,1,1,1,1,1,1,1,2,2,2,2,2）））

所以让我们稍微修改一下：

from scipy.sparse import csgraph
connected_components = csgraph.connected_components(graph)
result = []

for u in range(1, connected_components[0]):
    result.append(np.where(connected_components[1]==u)[0])

result

[array（[1,2,3,4,5,6,7,8]，dtype = int64），

数组（[9,10,11,12,13]，dtype = int64）]

同样请注意，在range我从1开始，因为Python标准从0开始计算，因为从1开始，这将被视为一个孤立的节点。如果原子的编号是非连续的，需要跳过孤立的节点，例如：

result = [r for r in result if len(r) > 1]

Answer 2

bigArr = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]] ## Your list of pairs of values
molArr = []
for pair in bigArr:
    flag = False
    for molecule in molArr:
        if pair[0] in molecule or pair[1] in molecule: ## Add both values if any of them are in the molecules list
            molecule.append(pair[0])
            molecule.append(pair[1])
            flag = True ## The values have been added to an existing list

    if not flag: ## The values weren't in an existing list so add them both
        molArr.append(pair)

i = 0
for i in range(len(molArr)): ## Remove duplicates in one loop
    molArr[i] = list(set(molArr[i]))

Answer 3

这是另一种方法：

a = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]

result = []

for sub in a:
    join = False
    for i, r in enumerate(result):
        if any([x in r for x in sub]):
            join = True
            index = i
    if join:
        result[index] += [y for y in sub if y not in result[index]]
    else:
        result.append(sub)

result
#[[1,2,3,4,5,6,7,8],[9,10,11,12,13]]

如果只有具有共同值的数组，如何组合数组？

3 个答案: