我有一个看起来像的熊猫数据框df
userid trip_id segmentid actual prediction
1 13 40 3 3
1 6 2 1 1
1 44 3 2 3
2 70 19 1 1
2 12 5 0 0
我需要创建一个按列userid分组的摘要数据框dfsummary ,其中包含3列userid,correct_classified,unsert_classified。 如果实际值和预测值相同,则将其正确分类,否则将分类错误。
我可以将对整个数据框的正确分类视为
correct_classified = submission[(submission['Actual'] == submission['prediction'])]
incorrect_classified = submission[(submission['Actual'] != submission['prediction'])]
但不知道如何创建按用户ID分组的摘要表,该表应该像这样
userid correct_classified incorrect_classified
1 2 1
2 2 0
答案 0 :(得分:4)
您可以在创建条件数组后使用pd.crosstab
:
flags = np.where(df['actual'].eq(df['prediction']), 'correct', 'incorrect')
res = pd.crosstab(df['userid'], flags)
print(res)
col_0 correct incorrect
userid
1 2 1
2 2 0
答案 1 :(得分:2)
您也可以使用pivot table
即
m = df['actual']==df['prediction']
# assign the conditions to new columns and aggregate.
df.assign(correct_classified=m,incorrect_classified=~m).pivot_table(index='userid',
aggfunc='sum',
values=['correct_classified',
'incorrect_classified'])
输出:
correct_classified incorrect_classified
userid
1 2.0 1.0
2 2.0 0.0