我有一个用户数据框,无论他们是否已注册,以及该模型对他们是否已注册的预测。我想为每个用户查找:TP(他们注册并模型预测他们确实同意),FP(他们未注册但模型预测他们确实已签名),FN(他们注册但模型预测他们否定),以及TN(他们没有注册,并且模型预测为否)。这里1表示他们注册,0表示他们没有注册。我想对用户进行分组,然后使用其他两列进行比较。例如,我可能会有以下内容:
Users | Signed_up | Prediction |
User1 1 0
User2 0 0
User1 1 1
User3 1 1
User2 0 1
User2 0 0
...
For TP, the resulting table might look something like:
Users | TP |
User1 1
User2 0
User3 1
For TN, the resulting table might look something like:
Users | TN |
User1 0
User2 1
User3 0
and so on for FP and FN.
我假设我在Users
列上进行分组,并使用lambda函数比较Sign_up
和Prediction
列,但是我不确定如何实际执行此操作。我将不胜感激!
答案 0 :(得分:4)
先进行比较,然后groupby
,然后groupby
+ sum
(df.assign(TP = df.Signed_up & df.Prediction,
TN = (df.Signed_up == 0) & (df.Prediction == 0),
FN = df.Signed_up & (df.Prediction == 0),
FP = (df.Signed_up == 0) & df.Prediction)
.groupby('Users')['TP', 'TN', 'FN', 'FP'].sum())
TP TN FN FP
Users
User1 1 0.0 1.0 0.0
User2 0 2.0 0.0 1.0
User3 1 0.0 0.0 0.0
受@BrianJoseph的启发,您只需键入更少的内容,就可以groupby
全部3列,确定大小并拆开除用户以外的所有内容:
df.groupby([*df]).size().unstack([1,2]).fillna(0)
Signed_up 1 0
Prediction 0 1 0 1
Users
User1 1.0 1.0 0.0 0.0
User2 0.0 0.0 2.0 1.0
User3 0.0 1.0 0.0 0.0
答案 1 :(得分:3)
请记住,熊猫可以使用函数结果进行分组。为了区分这4类结果,您只需要了解Signed_up
和Prediction
之间的关系。您可以像这样对它们进行分类:
grps = df.groupby(lambda index: (df.loc[index, 'Signed_up'], df.loc[index, 'Prediction']))
这只是给您groupby对象,您可以随意命名组,例如:
tp_df = grps.get_group((1,1))
答案 2 :(得分:2)
如果创建不同的dfs,则对于您的帖子中的每个模型预测,都可以使用布尔掩码和&
按位运算符来进行。 &
表示必须同时满足两个条件才能返回值,所以:
df = pd.read_csv('./Desktop/models.csv')
TP = df.loc[(df['Signed_up'] == 1) & (df['Prediction'] == 1)]
TN = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 0)]
FN = df.loc[(df['Signed_up'] == 1) & (df['Prediction'] == 0)]
FP = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 1)]
输出:
>>> TP
Users Signed_up Prediction
2 User1 1 1
3 User3 1 1
>>> TN = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 0)]
>>> TN
Users Signed_up Prediction
1 User2 0 0
5 User2 0 0
>>> FN = df.loc[(df['Signed_up'] == 1) & (df['Prediction'] == 0)]
>>> FN
Users Signed_up Prediction
0 User1 1 0
>>> FP = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 1)]
>>> FP
Users Signed_up Prediction
4 User2 0 1