根据其他列熊猫的条件填充nan

时间:2019-04-23 14:39:35

标签: python python-3.x pandas

我想用Activity_station的值填充None值。 数据如下,我创建了一些列来简化条件。

Shift_id    activity_name   activity_id activity_begin_time activity_end_time   activity_station    shift   code    day
0   123 start   D01-MCK-DI  09:00   09:05   None    D01 MCK DI
1   123 work    D01-MCK-DI  09:05   12:00   Za      D01 MCK DI
2   123 drive   D01-MCK-DI  12:00   12:30   Ro      D01 MCK DI
3   184 start   D01-MV-DI   09:00   09:05   None    D01 MV  DI
4   184 work    D01-MV-DI   09:05   12:00   Ca      D01 MV  DI
5   184 drive   D01-MV-DI   12:00   12:30   None    D01 MV  DI

根据需要加载数据

    df = pd.DataFrame({ 
    'Shift_id' :[ 123,123,123,184,184,184],
    'activity_name':['start','work','drive','start','work','drive'],
    'activity_id' : ['D01-MCK-DI','D01-MCK-DI','D01-MCK-DI','D01-MV-DI','D01-MV-DI','D01-MV-DI'],
    'activity_begin_time' : ['09:00','09:05','12:00','09:00','09:05','12:00'],
    'activity_end_time' : ['09:05','12:00','12:30','09:05','12:00','12:30'],
    'activity_station' : ['None', 'Za','Ro','None', 'Ca','None']})

df[['shift','code','day']] = df['activity_id'].str.split(pat="-", expand=True)

如果MV在activity_station列上的值为None

然后查看MV和MCK的偏移和日期相同的地方,并将MCK的acitivity_station值分配为MV的None值

我尝试了一些if else return语句,但毕竟没有成功。

结果应如下所示:

    Shift_id    activity_name   activity_id activity_begin_time activity_end_time   activity_station    shift   code    day
0   123 start   D01-MCK-DI  09:00   09:05   None    D01 MCK DI
1   123 work    D01-MCK-DI  09:05   12:00   Za      D01 MCK DI
2   123 drive   D01-MCK-DI  12:00   12:30   Ro      D01 MCK DI
3   184 start   D01-MV-DI   09:00   09:05   None    D01 MV  DI
4   184 work    D01-MV-DI   09:05   12:00   Ca      D01 MV  DI
5   184 drive   D01-MV-DI   12:00   12:30   Ro      D01 MV  DI

1 个答案:

答案 0 :(得分:0)

IIUC,您还需要一个分组列才能获得所需的输出。您当前正在描述按shiftday进行分组,但是这仍然只产生一个分组,因此我假设您也打算按activity_name进行分组。如果是这种情况,则可以在将数据框中的transform()值替换为None(即np.nan)之后使用NaN

df['activity_station'] = df.groupby(['shift','day','activity_name'])['activity_station'].transform(lambda x: x.ffill())

这将产生您想要的输出:

   Shift_id activity_name activity_id activity_begin_time activity_end_time  \
0       123         start  D01-MCK-DI               09:00             09:05   
1       123          work  D01-MCK-DI               09:05             12:00   
2       123         drive  D01-MCK-DI               12:00             12:30   
3       184         start   D01-MV-DI               09:00             09:05   
4       184          work   D01-MV-DI               09:05             12:00   
5       184         drive   D01-MV-DI               12:00             12:30   

  activity_station shift code day  
0              NaN   D01  MCK  DI  
1               Za   D01  MCK  DI  
2               Ro   D01  MCK  DI  
3              NaN   D01   MV  DI  
4               Ca   D01   MV  DI  
5               Ro   D01   MV  DI