在pandas数据框中,我想基于将其他列过滤为某些值来为该列分配值

时间:2019-02-24 13:26:14

标签: python pandas

例如,我想将“ ModelPrediction”列中的所有值更改为1,其中“ AgeGrp”列等于[0,5],“性别”列等于male,而“ PClass”列等于“ 1”以及“ 2”。

我已经将AgeGrp,Pclass列的数据类型更改为对象。

table

我的尝试如下:

train.loc[train['Sex'] == 'male' & ['Pclass'] == 1 & ['Pclass'] == 2 & ['AgeGrp'] == (0, 5], 'ModelPrediction'] = 1  

我对python / pandas的所有事物都是新手,感谢您的帮助!!谢谢!

2 个答案:

答案 0 :(得分:1)

我认为您需要添加()Interval,并且Pclass有两次条件,如果需要同时检查两个值,我认为这里需要isin

train = pd.DataFrame({'Sex':['male','female','male'],
                      'Pclass':[1,0,1],
                      'AgeGrp':[pd.Interval(0, 5, closed='right'),
                                pd.Interval(6, 10, closed='right'),
                                pd.Interval(0, 5, closed='right')],
                        'ModelPrediction':[0,1,0]})
print (train)
      Sex  Pclass   AgeGrp  ModelPrediction
0    male       1   (0, 5]                0
1  female       0  (6, 10]                1
2    male       1   (0, 5]                0

train.loc[(train['Sex'] == 'male') & 
          (train['Pclass'].isin([1, 2])) & 
          (train['AgeGrp'] == pd.Interval(0, 5, closed='right')), 'ModelPrediction'] = 1  

print (train)
      Sex  Pclass   AgeGrp  ModelPrediction
0    male       1   (0, 5]                1
1  female       0  (6, 10]                1
2    male       1   (0, 5]                1

答案 1 :(得分:1)

您非常接近,但是其中一个条件Pclass既为1也为2,是不可能的,间隔的语法不存在,并且您希望圆括号分隔每个条件:

train.loc[(train['Sex'] == 'male') & ((train['Pclass'] == 1) | (train['Pclass'] == 2)) & (train['AgeGrp'] > 0) & (train['AgeGrp'] <= 5), 'ModelPrediction'] = 1