使用pandas查找辍学率

时间:2018-01-16 14:46:24

标签: python pandas

给定一个表示图中边缘的数据集,我想找出传入数与传出数之间的差异,以找出节点中有多少丢失:

source    target
1         2
1         3
4         1
4         2
6         1

到目前为止我所拥有的并不是真的给了我想要的东西,我确定我错过了什么。

def find_dropout(edge_df):
    # how many outgoing are there from source
    indata = edge_df.groupby('target').count()
    # how many incoming are there to target
    utdata = edge_df.groupby('source').count()
    merged = pd.concat([indata, utdata], axis=1, join='inner')
    merged['dropout'] = (1 - (merged['source'] / merged['target'])) * 100
    return merged['dropout']

我做错了什么,大熊猫做出我想要的最理想的方式是什么?

0 个答案:

没有答案