我是Python的新手,我正在尝试使用存储在pandas数据框中的数据集来运行社区检测算法,为此,我需要从该数据集中创建一个边缘列表以放入图表中。我需要此边缘列表由具有匹配列值的行组成。 数据集由19列和2000多个行组成,我需要在每列具有匹配值的行之间做边。 例如,如果数据集是
id col1 col2 col3
1 12 10 20
2 14 10 19
3 12 10 9
然后将有以下边缘
row1 col1, row2 col1
row1 col2, row2 col2
row1 col2, row3 col2
row2 col2, row3 col2
我尝试了几种方法,但是似乎都无法使用我想要的最接近的方式使用以下代码:
#define edges as column rows that have matching data
edges = set()
for col in dataset:
for _, data in dataset.groupby(col):
edges.update(itertools.combinations(data.index, 2))
#create empty graph
G = nx.Graph()
#add index number as node to graph
G.add_nodes_from(dataset.index)
#add edges created
G.add_edges_from(edges)
#uses community library to work define best partition that maximise modularity (Louvain Algorithm)
partition= community.best_partition(G)
#create graph from the results of the partition
size = float(len(set(partition.values())))
pos = nx.spring_layout(G)
count = 0.
for com in set(partition.values()) :
count = count + 1.
list_nodes = [nodes for nodes in partition.keys()
if partition[nodes] == com]
nx.draw_networkx_nodes(G, pos, list_nodes, node_size = 20, cmap=plt.cm.RdYlBu,
node_color=list(partition.values()))
plt.show()