创建多次返回numpy的变量where

时间:2017-11-24 17:20:41

标签: python python-3.x pandas numpy

您好我是stata用户,现在我试图将我的代码传递给stata到python / pandas。在这种情况下,我想创建一个新变量size,如果作业数在1到9之间,则赋值1;如果作业在10到49之间,则值为2,在50和199之间为3,为4超过200个工作岗位。

然后,如果有可能标记他们(1:'微',2:'小',3:'中位数',4:'大')

id  year  entry  cohort  jobs  
1  2009    0     NaN      3
1  2012    1     2012     3
1  2013    0     2012     4
1  2014    0     2012     11
2  2010    1     2010     11
2  2011    0     2010     12
2  2012    0     2010     13       
3  2007    0     NaN      38
3  2008    0     NaN      58       
3  2012    1     2012     58       
3  2013    0     2012     70
4  2007    0     NaN      231
4  2008    0     NaN      241

我尝试使用此代码但未能成功

df['size'] = np.where((1 <= df['jobs'] <= 9),'Micro',np.where((10 <= df['jobs'] <= 49),'Small'),np.where((50 <= df['jobs'] <= 200),'Median'),np.where((200 <= df['empleo']),'Big','NaN'))

1 个答案:

答案 0 :(得分:2)

您要做的事情称为binning use pd.cut,即

df['new'] = pd.cut(df['jobs'],bins=[1,10,50,201,np.inf],labels=['micro','small','medium','big'])

输出:

   id  year  entry  cohort  jobs     new
0    1  2009      0     NaN     3   micro
1    1  2012      1  2012.0     3   micro
2    1  2013      0  2012.0     4   micro
3    1  2014      0  2012.0    11   small
4    2  2010      1  2010.0    11   small
5    2  2011      0  2010.0    12   small
6    2  2012      0  2010.0    13   small
7    3  2007      0     NaN    38   small
8    3  2008      0     NaN    58  medium
9    3  2012      1  2012.0    58  medium
10   3  2013      0  2012.0    70  medium
11   4  2007      0     NaN   231     big
12   4  2008      0     NaN   241     big

对于多种情况,您必须选择np.select而不是np.where。希望有所帮助。

numpy.select(condlist, choicelist, default=0)
     

condlist是你的条件清单,选择列表是   满足条件的选择列表。默认= 0,在这里你可以放   那是np.nan

np.select的帮助下使用.between做同样的事情,即

np.select([df['jobs'].between(1,10),
           df['jobs'].between(10,50),
           df['jobs'].between(50,200),
           df['jobs'].between(200,np.inf)],
           ['Micro','Small','Median','Big']
           ,'NaN')