给定一个表示图中边缘的数据集,我想找出传入数与传出数之间的差异,以找出节点中有多少丢失:
source target
1 2
1 3
4 1
4 2
6 1
到目前为止我所拥有的并不是真的给了我想要的东西,我确定我错过了什么。
def find_dropout(edge_df):
# how many outgoing are there from source
indata = edge_df.groupby('target').count()
# how many incoming are there to target
utdata = edge_df.groupby('source').count()
merged = pd.concat([indata, utdata], axis=1, join='inner')
merged['dropout'] = (1 - (merged['source'] / merged['target'])) * 100
return merged['dropout']
我做错了什么,大熊猫做出我想要的最理想的方式是什么?