Dataframe将函数应用于具有特定条件的行

时间:2017-06-27 07:34:33

标签: python pandas dataframe sklearn-pandas

以下是我的数据框中的示例:

id      DPT_DATE  TRANCHE_NO  TRAIN_NO  J_X  RES_HOLD_IND
0     2017-04-01       330.0    1234.0 -1.0         100.0
1     2017-04-01       330.0    1234.0  0.0          80.0
2     2017-04-02       331.0    1235.0 -1.0          91.0
3     2017-04-02       331.0    1235.0  0.0          83.0
4     2017-04-03       332.0    1236.0 -1.0          92.0
5     2017-04-03       332.0    1236.0  0.0          81.0
6     2017-04-04       333.0    1237.0 -1.0          87.0
7     2017-04-04       333.0    1237.0  0.0          70.0
8     2017-04-05       334.0    1238.0 -1.0          93.0
9     2017-04-05       334.0    1238.0  0.0          90.0
10    2017-04-06       335.0    1239.0 -1.0          89.0
11    2017-04-06       335.0    1239.0  0.0          85.0
12    2017-04-07       336.0    1240.0 -1.0          82.0
13    2017-04-07       336.0    1240.0  0.0          76.0

这是Trains'的数据框。预订,DPT_DATE =出发日期TRAIN_NO =火车次数J_X =出发前的天数(J_X = 0.0表示出发日,J_X = -1表示出发后的天数),RES_HOLD_IND是当天的预订保留

我想为每个DPT_DATE创建一个新列,TRAIN_NO为我提供当天的RES_HOLD_IND J_X = -1

示例(我想要这个):

id      DPT_DATE  TRANCHE_NO  TRAIN_NO  J_X  RES_HOLD_IND  RES_J-1
0     2017-04-01       330.0    1234.0 -1.0         100.0  100.0
1     2017-04-01       330.0    1234.0  0.0          80.0  100.0
2     2017-04-02       331.0    1235.0 -1.0          91.0  91.0
3     2017-04-02       331.0    1235.0  0.0          83.0  91.0
4     2017-04-03       332.0    1236.0 -1.0          92.0  92.0
5     2017-04-03       332.0    1236.0  0.0          81.0  92.0
6     2017-04-04       333.0    1237.0 -1.0          87.0  87.0
7     2017-04-04       333.0    1237.0  0.0          70.0  87.0

感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

我认为您需要先按boolean indexingquery进行过滤,然后groupby使用DataFrameGroupBy.ffill进行过滤,如果{1}}值始终位于第一行每组:

-1

如果df['RES_J-1'] = df.query('J_X == -1')['RES_HOLD_IND'] #alternative #df['RES_J-1'] = df.loc[df['J_X'] == -1, 'RES_HOLD_IND'] df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1'].ffill() print (df) DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND RES_J-1 0 2017-04-01 330.0 1234.0 -1.0 100.0 100.0 1 2017-04-01 330.0 1234.0 0.0 80.0 100.0 2 2017-04-02 331.0 1235.0 -1.0 91.0 91.0 3 2017-04-02 331.0 1235.0 0.0 83.0 91.0 4 2017-04-03 332.0 1236.0 -1.0 92.0 92.0 5 2017-04-03 332.0 1236.0 0.0 81.0 92.0 6 2017-04-04 333.0 1237.0 -1.0 87.0 87.0 7 2017-04-04 333.0 1237.0 0.0 70.0 87.0 8 2017-04-05 334.0 1238.0 -1.0 93.0 93.0 9 2017-04-05 334.0 1238.0 0.0 90.0 93.0 10 2017-04-06 335.0 1239.0 -1.0 89.0 89.0 11 2017-04-06 335.0 1239.0 0.0 85.0 89.0 12 2017-04-07 336.0 1240.0 -1.0 82.0 82.0 13 2017-04-07 336.0 1240.0 0.0 76.0 82.0 每组只有一个,但并非总是先使用:

-1