我有一个Pandas数据框,其中有两列不同类型的列,例如
User | Computer
1 A
2 B
3 A
我已经使用from_pandas_dataframe方法创建了networkx图,并且可以成功地对其进行视觉绘制。它以边缘的形式显示了每个节点以及作为用户的节点和作为计算机的节点之间的每种关系。
我真正感兴趣的是用户之间的关系。即在该示例中,用户1链接到用户3,因为他们两个都有共同的计算机A。我想要一种重构图的方法,以仅将用户显示为节点,并将计算机作为连接两个用户的边缘(请注意,我不必保留构成边缘的计算机的数据对人好点)。
我尝试了一些自联接,但是输出并没有真正按照我的意愿工作:
import pandas as pd
df = pd.DataFrame({'user':['a','b','c','d', 'd', 'e'], 'computer':[1,1,2,3,1,1]})
df
id user computer
0 a 1
1 b 1
2 c 2
3 d 3
4 d 1
5 e 1
joined = df.join(df, on='computer', rsuffix='y')
joined
id user computer usery computery
0 a 1 b 1
1 b 1 b 1
2 c 2 c 2
3 d 3 d 3
4 d 1 b 1
5 e 1 b 1
在上面的示例中,即使两者都与计算机1相关联,我也没有得到AD对。
实现此目标的最佳方法是什么?是否应该通过操纵Pandas中的数据以某种方式仅显示用户之间的配对?如果可以,怎么办?还是应该通过networkx进行操作?
答案 0 :(得分:1)
如果我正确理解了您的问题,则希望将图形转换为以用户为节点的一组完全连接的子图。
重复您的代码:
import numpy as np, pandas as pd, matplotlib.pyplot as plt, networkx as nx
df = pd.DataFrame(
{'user':['a','b','c','d', 'd', 'e'], 'computer':[1,1,2,3,1,1]})
G = nx.from_pandas_edgelist(df, source='user', target='computer')
pos = nx.spring_layout(G)
colours = [('red' if str(node).isdigit() else 'blue') for node in G.nodes]
nx.draw_networkx(G, pos, label=True, node_color=colours)
因此,如果您要制作仅具有蓝色节点的新图形,则可以使用nx.connected_component_subgraphs:
import itertools
subgraphs = []
for cc in nx.connected_component_subgraphs(G):
# collect all nodes in the connected subgraph that aren't labeled with digits
nodes = [a for a in cc.nodes if not str(a).isdigit()]
Subgraph = nx.Graph()
Subgraph.add_nodes_from(nodes)
# generate all pairwise combinations for these nodes and add as edges:
Subgraph.add_edges_from(itertools.combinations(nodes, 2))
subgraphs.append(Subgraph)
# optional:
# combine all subgraphs into a new graph
G_new = nx.compose(*subgraphs)
pos = nx.spring_layout(G_new)
nx.draw_networkx(G_new, pos)
绘制加权边缘需要一些修改:
G = nx.from_pandas_edgelist(df, source='user', target='computer')
# add some more connections
G.add_nodes_from([4,5,6])
G.add_edges_from([(4, 'b'), (4, 'a'), (5, 'b'), (5, 'c'), (6, 'b'), (6, 'a')])
pos = nx.spring_layout(G)
colours = [('red' if str(node).isdigit() else 'blue') for node in G.nodes]
nx.draw_networkx(G, pos, label=True, node_color=colours)
import itertools
# collect all connection nodes
connecting_nodes = [n for n in G.nodes if str(n).isdigit()]
edgelist = []
for cn in connecting_nodes:
# create all combinations of adjacent nodes and store in list of tuples
edgelist += itertools.combinations(G.neighbors(cn), 2)
#remove positional information
edgelist = [tuple(sorted(list(set(a)))) for a in edgelist]
from collections import Counter
# now count occurences of each tuple (= number of "independent connections"
# between two non-digit nodes).
# Counter(edgelist) returns a dict, i.e. {('a', 'b'): 2, ...},
# which can be unpacked like so:
weighted_edges = [(*u, v) for u,v in Counter(edgelist).items()]
# now make new graph with non-digit nodes and add weighted edges:
H = nx.Graph()
H.add_nodes_from([n for n in G.nodes if not str(n).isdigit()])
H.add_weighted_edges_from(weighted_edges)
# and draw, with width proportional to weight
pos = nx.spring_layout(H)
weights = [e[2]['weight'] for e in H.edges(data=True)]
nx.draw_networkx(H, pos, width=weights)
答案 1 :(得分:1)
就我个人而言,我认为在数据库上使用内部联接确实比使用图形操作更干净。在熊猫中,联接可以这样执行:
import pandas as pd
def get_edges(df, var, on):
"""Get all combinations of variable var that share a value for variable on (using an inner join)."""
inner_self_join = df.merge(df, how='inner', on=on)
excluding_self_pairs = inner_self_join[inner_self_join[var + '_x']!=inner_self_join[var + '_y']]
edges = excluding_self_pairs[[var + '_x', var + '_y']].values
return edges
df = pd.DataFrame({'user':['a','b','c','d', 'd', 'e'], 'computer':[1,1,2,3,1,1]})
edges = get_edges(df, 'user', 'computer')
# array([['a', 'b'],
# ['a', 'd'],
# ['a', 'e'],
# ['b', 'a'],
# ['b', 'd'],
# ['b', 'e'],
# ['d', 'a'],
# ['d', 'b'],
# ['d', 'e'],
# ['e', 'a'],
# ['e', 'b'],
# ['e', 'd']], dtype=object)
然后可以使用边缘列表创建networkx Graph
实例。
答案 2 :(得分:1)
您所描述的内容是使用二分投影内置于networkx中的。这是加权版本:
import pandas as pd
import networkx as nx
df = pd.DataFrame({'user':['a','b','c','d', 'd', 'e'], 'computer':[1,1,2,3,1,1]})
G = nx.from_pandas_edgelist(df, source='user', target='computer')
cnodes = [1,2,3] #the computers
unodes = ['a', 'b', 'c', 'd', 'e'] #the users
#create the network based only on users. An edge means they share a computer
Uprojection = nx.algorithms.bipartite.overlap_weighted_projected_graph(G, unodes)
#the edges are weighted based on how much they share (see documentation for details)
Uprojection.edges(data=True)
>EdgeDataView([('a', 'd', {'weight': 0.5}), ('a', 'b', {'weight': 1.0}), ('a', 'e', {'weight': 1.0}), ('b', 'd', {'weight': 0.5}), ('b', 'e', {'weight': 1.0}), ('d', 'e', {'weight': 0.5})])
nx.draw(Uprojection, with_labels=True)