我有一个如下所示的数据框,我想创建4列来计算准确度分布
Company Error_Rate
A 9
B 10
c 20
GK 17
GK 18
GK 30
GK 35
GK 25
GK 32
GK 40
GK 50
MB 60
MB 70
MB 70
我希望有一张这样的桌子
Company Error_Rate Above 90% 80% - 90% 65% - 80% Below 65%
A 9 1 0 0 0
B 10 1 0 0 0
c 20 0 1 0 0
GK 17 0 1 0 0
GK 18 0 1 0 0
GK 30 0 0 1 0
GK 35 0 0 1 0
GK 40 0 0 0 1
我试过
df['Above 90%'] = np.where(df['Error_Rate']<=10,1,0)
df['80% - 90%'] = np.where(df['Error_Rate'] <= 20,(np.where(df['Error_Rate'] > 10, 1, 0)),0)
df['65% - 80%'] = np.where(df['Error_Rate'] <= 35,(np.where(df['Error_Rate'] > 20, 1, 0)),0)
df['Below 65%'] = np.where(df['Error_Rate']>35,1,0)
它没有给我想要的结果。我在某个地方出错了吗?
答案 0 :(得分:2)
如果您必须编写4个np.where
条件来计算列,那么您做错了。我认为考虑采用不同的方法是明智的。
一个简洁的选项涉及pd.cut
+ pd.get_dummies
。
bins = [0, 65, 80, 90, 100]
labels = ['Below 65%', '65% - 80%', '80% - 90%', 'Above 90%']
pd.concat([
df, pd.get_dummies(pd.cut(100 - df.Error_Rate, bins=bins, labels=labels, right=True))
], axis=1
)
Company Error_Rate Below 65% 65% - 80% 80% - 90% Above 90%
0 A 9 0 0 0 1
1 B 10 0 0 0 1
2 c 20 0 0 1 0
3 GK 17 0 0 1 0
4 GK 18 0 0 1 0
5 GK 30 0 1 0 0
6 GK 35 0 1 0 0
7 GK 25 0 1 0 0
8 GK 32 0 1 0 0
9 GK 40 1 0 0 0
10 GK 50 1 0 0 0
11 MB 60 1 0 0 0
12 MB 70 1 0 0 0
13 MB 70 1 0 0 0
答案 1 :(得分:1)
使用:
df['Above 90%'] = np.where(df['Error_Rate']<=10,1,0)
df['80% - 90%'] = np.where((df['Error_Rate'] <= 20) & (df['Error_Rate'] > 10),1,0)
df['65% - 80%'] = np.where((df['Error_Rate'] <= 35) & (df['Error_Rate'] > 20),1,0)
df['Below 65%'] = np.where(df['Error_Rate']>35,1,0)
print (df)
Company Error_Rate Above 90% 80% - 90% 65% - 80% Below 65%
0 A 9 1 0 0 0
1 B 10 1 0 0 0
2 c 20 0 1 0 0
3 GK 17 0 1 0 0
4 GK 18 0 1 0 0
5 GK 30 0 0 1 0
6 GK 35 0 0 1 0
7 GK 25 0 0 1 0
8 GK 32 0 0 1 0
9 GK 40 0 0 0 1
10 GK 50 0 0 0 1
11 MB 60 0 0 0 1
12 MB 70 0 0 0 1
13 MB 70 0 0 0 1