如何创建对应的列

时间:2017-10-26 05:46:24

标签: python pandas

我得到了这样的数据:

import pandas as pd

data_dict  = {'a':[5,2,4,5,3,3,1,2,3],
             'name':['Jack','jon',"tom",'lazzy','mack','zack','makilo','drag','maiko']}
data_01 = pd.DataFrame(data_dict)
input:
    a   name
0   5   Jack
1   2   jon
2   4   tom
3   5   lazzy
4   3   mack
5   3   zack
6   1   makilo
7   2   drag
8   3   maiko

我希望输出应该有3列'good','mid','poor',其中a >3,==3,<3 期望输出应为:

a name good mid poor
5 Jack 1    0   0
2 jon  0    0   1
4 tom  1    0   0 
5 lazzy 1   0   0
3 mack 0    1   0 
....

感谢您的帮助!

4 个答案:

答案 0 :(得分:4)

  • 我将在np.sign上使用data_01.a - 3
    • 这会在-1时返回< 3,在0时返回== 3,在1
    • 时返回> 3
  • 然后我将它们用作标签数组

    上的索引
    labels = np.array(['mid', 'good', 'poor'])
    
  • 切片时,0映射到'mid'1映射到'good'-1映射到'poor' < / p>

  • 最后,我使用pd.get_dummies创建虚拟列。
labels = np.array(['mid', 'good', 'poor'])

data_01.join(pd.get_dummies(labels[np.sign(data_01.a - 3)]))

   a    name  good  mid  poor
0  5    Jack     1    0     0
1  2     jon     0    0     1
2  4     tom     1    0     0
3  5   lazzy     1    0     0
4  3    mack     0    1     0
5  3    zack     0    1     0
6  1  makilo     0    0     1
7  2    drag     0    0     1
8  3   maiko     0    1     0

备选方案1
这使用np.eye生成虚拟列。我创建字典以将假人绑定到标签并将字典传递给pd.DataFrame.assign

dum = dict(zip(
    ['mid', 'good', 'poor'],
    np.eye(3, dtype=int)[:, np.sign(data_01.a - 3)]
))

data_01.assign(**dum)

   a    name  good  mid  poor
0  5    Jack     1    0     0
1  2     jon     0    0     1
2  4     tom     1    0     0
3  5   lazzy     1    0     0
4  3    mack     0    1     0
5  3    zack     0    1     0
6  1  makilo     0    0     1
7  2    drag     0    0     1
8  3   maiko     0    1     0

备选方案2
我再次使用np.eye,但这次,我从头开始创建数据框,并使用pd.DataFrame.join将其附加到data_01

dum = pd.DataFrame(
    np.eye(3, dtype=int)[np.sign(data_01.a - 3)],
    data_01.index, ['mid', 'good', 'poor']
)

data_01.join(dum)

   a    name  mid  good  poor
0  5    Jack    0     1     0
1  2     jon    0     0     1
2  4     tom    0     1     0
3  5   lazzy    0     1     0
4  3    mack    1     0     0
5  3    zack    1     0     0
6  1  makilo    0     0     1
7  2    drag    0     0     1
8  3   maiko    1     0     0

答案 1 :(得分:2)

如果需要计数值:

s = np.select([data_01['a'] < 3, data_01['a'] > 3], ['poor','good'], default='mid')

df = data_01.join(data_01.groupby(['name', s]).size().unstack(fill_value=0), on='name')
print (df)
   a    name  good  mid  poor
0  5    Jack     1    0     0
1  2     jon     0    0     1
2  4     tom     1    0     0
3  5   lazzy     1    0     0
4  3    mack     0    1     0
5  3    zack     0    1     0
6  1  makilo     0    0     1
7  2    drag     0    0     1
8  3   maiko     0    1     0

答案 2 :(得分:1)

只需与numpy进行一些比较:

arr_a = np.array(a)

good = arr_a > 3
mid  = arr_a == 3
poor = arr_a < 3

然后将这些数组作为列附加到数据框中。

答案 3 :(得分:1)

另一种方法是,您可以使用numpy.where来获取有条件填充的列

import pandas as pd 
import numpy as np
data_dict  = {'a':[5,2,4,5,3,3,1,2,3],
             'name':['Jack','jon',"tom",'lazzy','mack','zack','makilo','drag','maiko']}

data_01 = pd.DataFrame(data_dict)

#This will create columns 'good','mid','poor' all with default values '0'

data_01['good']=0
data_01['mid']=0
data_01['poor']=0

#Here you are setting each column with value '1' , based on the condition 

data_01['good']=np.where(data_01['a'] > 3,1, data_01['good'])
data_01['mid']=np.where(data_01['a'] == 3,1, data_01['mid'])
data_01['poor']=np.where(data_01['a'] < 3,1, data_01['poor'])
print data_01

输出

 a    name  good  mid  poor
0  5    Jack     1    0     0
1  2     jon     0    0     1
2  4     tom     1    0     0
3  5   lazzy     1    0     0
4  3    mack     0    1     0
5  3    zack     0    1     0
6  1  makilo     0    0     1
7  2    drag     0    0     1
8  3   maiko     0    1     0