我正在尝试根据给定条件(多个条件)创建一个熊猫数据框列

时间:2019-06-16 01:11:43

标签: python pandas

我使用了两种不同的代码来解决这个问题: 1.我使用了dataframe内部的条件。 2.我尝试使用这些功能。

我得到syntaxerror: invalid syntax

我仍然是使用Pyton的初学者。

第一种方法:

<df['hours_week'] = ['less_than_40' if x < 40 'between_40_and_45' elif x > 40 and x <= 45 'between_40_and_60' elif x >45 and x <= 60 'between_60_and_80' elif x >60 and x <=80 else 'more_than_80' for x in df['hours_per_week']]>

第二种方法:

<def set_value(x):
     for x in df['hours_per_week']:
         if x < 40:
             t == print " less_than_40"
         elif (x > 40 and x <= 45):
             t == print "between_40_and_45"
         elif(x>45 and x <=60):
             t == print "between_40_and_45"
         elif(x>60 and x <= 80):
             t == print "between_60_and_80"
         else:
             t == print "more_than_80"
         return t
df['hours_week'] = df['hours_per_week'].apply(set_value,args=())

这是第一种方法的收获:

 File "<ipython-input-36-e90a4b2f98cc>", line 1
    df['hours_week'] = ['less_than_40' if x < 40 'between_40_and_45' elif x > 40 and x <= 45 'between_40_and_60' elif x >45 and x <= 60 'between_60_and_80' elif x >60 and x <=80 else 'more_than_80' for x in df['hours_per_week']]
                                                                   ^
SyntaxError: invalid syntax

使用第二种方法:

 File "<ipython-input-44-0a5dc69b4a15>", line 4
    t == print " less_than_40"
                             ^
SyntaxError: invalid syntax

2 个答案:

答案 0 :(得分:0)

pandas中,我们通常使用pd.cut

df['hours_week']=pd.cut(df['hours_per_week'],bins=[-np.inf,40,45,60,80,np.inf])

您还可以在此处添加标签,labels=['less_than_40','between_40_and_45'....]

答案 1 :(得分:0)

您也可以使用searchsorted

bins = pd.Series([40, 45, 60, 80])

labels = ['less_than_40', 'between_40_and_45', 'between_45_and_60', 
          'between_60_and_80', 'more_than_80']

df['hours_week'] = df['hours_per_week'].map(lambda x: labels[bins.searchsorted(x)])

第一个标签实际上应该是“ less_than_or_equal_to_40”。