将新列添加到数据框,其中值取决于同一列的先前行值

时间:2018-11-05 21:24:57

标签: python pandas

我有一个数据帧(df),其头部看起来像:

  BB   NEW_DATE     PICKED
1123 03/10/2018 03/10/2018
1123 04/10/2018 04/10/2018
1123 05/10/2018 05/10/2018
1123 09/10/2018 09/10/2018
1123 04/01/2013 01/04/2013
1123 07/01/2013 07/01/2013
1123 08/01/2013 08/01/2013

我正在尝试添加一个名为FINAL的新列,该列的值部分取决于FINAL的先前行值。

if df['PICKED'] < df['FINAL'].shift(-1):
    if df['NEW_DATE'].isnumeric():
        df['FINAL'] = df['NEW_DATE'] 
    else:
        df['FINAL'] = df['PICKED']
    df['FINAL'] = df['PICKED']

对于每行,如果PICKED小于先前的行值FINAL,则如果NEW_DATE是有效日期,则当前的FINAL等于当前行值NEW_DATE,否则FINAL等于PICKED。如果PICKED大于或等于FINAL的前一行值,则FINAL等于PICKED

因此在上面的数据框中,FINAL列看起来像这样;

  BB     NEW_DATE       PICKED       FINAL
1123   03/10/2018   03/10/2018  03/10/2018
1123   04/10/2018   04/10/2018  04/10/2018
1123   05/10/2018   05/10/2018  05/10/2018
1123   09/10/2018   09/10/2018  09/10/2018
1123   04/01/2013   01/04/2013  04/01/2013
1123   07/01/2013   07/01/2013  07/01/2013
1123   08/01/2013   08/01/2013  08/01/2013

我尝试使用以下代码进行编码但没有任何成功

df['FINAL'] = np.where(df['PICKED'] < df['FINAL'].shift(-1), df.NEW_DATE.fillna(df.DATE), df['PICKED'])

我也尝试过:

for row in df.iterrows:

    if index == 0 :
        row['FINAL'] = row['NEW_DATE']
    else:

        if row['PICKED'] < row['FINAL'].shift(-1):
            if isinstance(row['NEW_DATE'], pd.DatetimeIndex):
                row['FINAL'] = row['NEW_DATE']
            else:
                row['FINAL'] = row['PICKED']
        else:
            row['FINAL'] = row['PICKED']

但出现错误:TypeError: 'method' object is not iterable

1 个答案:

答案 0 :(得分:1)

我想不出没有循环的方法,所以这是一种方法。

# Initalise the first value of FINAL that will be the previous value 
# in the first iteration of the loop
prev_final = df.loc[0,'PICKED'] 

#create a list containing the data to create the column FINAL after
list_final = [prev_final] 

# loop over the rows with itertuples, not the first row as it has been take care of before
for new_date, picked in df.loc[1:,['NEW_DATE','PICKED']].itertuples(index=False):

    # check the two conditions at once as if both are not met, then the value in FINAL is from PICKED
    if (picked < prev_final) & isinstance(new_date, pd.datetime):
        # add the value from NEW_DATE
        list_final.append(new_date) 
        # and update the prev_final for the next iteration of the loop
        prev_final = new_date 

    else: # same idea if conditions not met
        list_final.append(picked)
        prev_final = picked

#outside of the loop, create the column with the list
df['FINAL'] = list_final

print(df)
     BB   NEW_DATE     PICKED      FINAL
0  1123 2018-03-10 2018-03-10 2018-03-10
1  1123 2018-04-10 2018-04-10 2018-04-10
2  1123 2018-05-10 2018-05-10 2018-05-10
3  1123 2018-09-10 2018-09-10 2018-09-10
4  1123 2013-04-01 2013-01-04 2013-04-01
5  1123 2013-07-01 2013-07-01 2013-07-01
6  1123 2013-08-01 2013-08-01 2013-08-01