分组规则:
例如:
c0 c1 c2 c3
A 1 0 0 1
B 0 0 1 0
C 0 0 0 1
D 0 1 1 0
E 0 1 0 0
预期产出:
[[A, C], [B, D, E]]
如你所见,B和E不分享" 1"在列中,但他们有" D"因此,所有3个应该分组
答案 0 :(得分:5)
这是networkx的解决方案。
import networkx as nx
a = np.where(df.T, df.index, '').sum(axis=1)
g = [list(x) for x in a if len(x) > 1]
G = nx.Graph(g)
list(nx.connected_components(G))
[{'B', 'D', 'E'}, {'A', 'C'}]
答案 1 :(得分:2)
这可以达到你想要的效果:
import numpy as np
from itertools import combinations
import networkx as nx
df
"""output:
1 2 3 4
0
A 1 0 0 1
B 0 0 1 0
C 0 0 0 1
D 0 1 1 0
E 0 1 0 0
"""
df.index.tolist()
"""output:
['A', 'B', 'C', 'D', 'E']
"""
list(combinations(df.index.tolist(),2))
"""output :
[('A', 'B'),
('A', 'C'),
('A', 'D'),
('A', 'E'),
('B', 'C'),
('B', 'D'),
('B', 'E'),
('C', 'D'),
('C', 'E'),
('D', 'E')]
"""
results = [x for x in list(combinations(df.index.tolist(),2)) if np.sum(df.loc[x[0],:].multiply(df.loc[x[1],:])) > 0]
results
"""output:
[('A', 'C'), ('B', 'D'), ('D', 'E')]
"""
list(nx.connected_components(nx.Graph(results)))
"""output:
[{'A', 'C'}, {'B', 'D', 'E'}]
"""