我得到了这样的数据:
import pandas as pd
data_dict = {'a':[5,2,4,5,3,3,1,2,3],
'name':['Jack','jon',"tom",'lazzy','mack','zack','makilo','drag','maiko']}
data_01 = pd.DataFrame(data_dict)
input:
a name
0 5 Jack
1 2 jon
2 4 tom
3 5 lazzy
4 3 mack
5 3 zack
6 1 makilo
7 2 drag
8 3 maiko
我希望输出应该有3列'good','mid','poor'
,其中a >3,==3,<3
期望输出应为:
a name good mid poor
5 Jack 1 0 0
2 jon 0 0 1
4 tom 1 0 0
5 lazzy 1 0 0
3 mack 0 1 0
....
感谢您的帮助!
答案 0 :(得分:4)
np.sign
上使用data_01.a - 3
-1
时返回< 3
,在0
时返回== 3
,在1
> 3
然后我将它们用作标签数组
上的索引labels = np.array(['mid', 'good', 'poor'])
切片时,0
映射到'mid'
,1
映射到'good'
,-1
映射到'poor'
< / p>
pd.get_dummies
创建虚拟列。labels = np.array(['mid', 'good', 'poor'])
data_01.join(pd.get_dummies(labels[np.sign(data_01.a - 3)]))
a name good mid poor
0 5 Jack 1 0 0
1 2 jon 0 0 1
2 4 tom 1 0 0
3 5 lazzy 1 0 0
4 3 mack 0 1 0
5 3 zack 0 1 0
6 1 makilo 0 0 1
7 2 drag 0 0 1
8 3 maiko 0 1 0
备选方案1
这使用np.eye
生成虚拟列。我创建字典以将假人绑定到标签并将字典传递给pd.DataFrame.assign
dum = dict(zip(
['mid', 'good', 'poor'],
np.eye(3, dtype=int)[:, np.sign(data_01.a - 3)]
))
data_01.assign(**dum)
a name good mid poor
0 5 Jack 1 0 0
1 2 jon 0 0 1
2 4 tom 1 0 0
3 5 lazzy 1 0 0
4 3 mack 0 1 0
5 3 zack 0 1 0
6 1 makilo 0 0 1
7 2 drag 0 0 1
8 3 maiko 0 1 0
备选方案2
我再次使用np.eye
,但这次,我从头开始创建数据框,并使用pd.DataFrame.join
将其附加到data_01
dum = pd.DataFrame(
np.eye(3, dtype=int)[np.sign(data_01.a - 3)],
data_01.index, ['mid', 'good', 'poor']
)
data_01.join(dum)
a name mid good poor
0 5 Jack 0 1 0
1 2 jon 0 0 1
2 4 tom 0 1 0
3 5 lazzy 0 1 0
4 3 mack 1 0 0
5 3 zack 1 0 0
6 1 makilo 0 0 1
7 2 drag 0 0 1
8 3 maiko 1 0 0
答案 1 :(得分:2)
如果需要计数值:
s = np.select([data_01['a'] < 3, data_01['a'] > 3], ['poor','good'], default='mid')
df = data_01.join(data_01.groupby(['name', s]).size().unstack(fill_value=0), on='name')
print (df)
a name good mid poor
0 5 Jack 1 0 0
1 2 jon 0 0 1
2 4 tom 1 0 0
3 5 lazzy 1 0 0
4 3 mack 0 1 0
5 3 zack 0 1 0
6 1 makilo 0 0 1
7 2 drag 0 0 1
8 3 maiko 0 1 0
答案 2 :(得分:1)
只需与numpy
进行一些比较:
arr_a = np.array(a)
good = arr_a > 3
mid = arr_a == 3
poor = arr_a < 3
然后将这些数组作为列附加到数据框中。
答案 3 :(得分:1)
另一种方法是,您可以使用numpy.where来获取有条件填充的列
import pandas as pd
import numpy as np
data_dict = {'a':[5,2,4,5,3,3,1,2,3],
'name':['Jack','jon',"tom",'lazzy','mack','zack','makilo','drag','maiko']}
data_01 = pd.DataFrame(data_dict)
#This will create columns 'good','mid','poor' all with default values '0'
data_01['good']=0
data_01['mid']=0
data_01['poor']=0
#Here you are setting each column with value '1' , based on the condition
data_01['good']=np.where(data_01['a'] > 3,1, data_01['good'])
data_01['mid']=np.where(data_01['a'] == 3,1, data_01['mid'])
data_01['poor']=np.where(data_01['a'] < 3,1, data_01['poor'])
print data_01
输出
a name good mid poor
0 5 Jack 1 0 0
1 2 jon 0 0 1
2 4 tom 1 0 0
3 5 lazzy 1 0 0
4 3 mack 0 1 0
5 3 zack 0 1 0
6 1 makilo 0 0 1
7 2 drag 0 0 1
8 3 maiko 0 1 0