根据行条件替换nan值

时间:2018-06-20 11:57:00

标签: python python-3.x pandas dataframe

这是我的原始数据框df _:

 index_label,id_label,morning,evening,night
 a,x,nan,eating,sleep
 b,x,shower,eating,nan
 c,x,nan,nan,nan
 d,y,work,reading,travel
 e,y,nan,reading,nan
 f,y,work,nan,nan
 g,z,shower,nan,travel
 h,z,shower,eating,nan

然后我尝试使用基于相同id_labels的相同数据帧df中的非值替换nan值。每个“早晨”,“晚上”列都需要从nan中清除。 “晚上”列应保持不变。

例如,我将其写在“早晨”列中

crit_nan_ = pd.isna(df_[['morning']])
df_nan_ = df_.loc[crit_nan_]
df_clean_ = df_.loc[~crit_nan_]

但是接下来我如何到达结果数据框:

 index_label,id_label,morning,evening,night
 a,x,shower,eating,sleep
 b,x,shower,eating,nan
 c,x,shower,eating,nan
 d,y,work,reading,travel
 e,y,work,reading,nan
 f,y,work,reading,nan
 g,z,shower,eating,travel
 h,z,shower,eating,nan

2 个答案:

答案 0 :(得分:2)

可以使用df.groupbydf.fillna获得结果数据帧:

def fill_na(x):
    return x.fillna(method="ffill").fillna(method="bfill")

for col in ("morning", "evening", ):
    d[col] = d.groupby("id_label")[col].transform(fill_na)

答案 1 :(得分:0)

这是使用字典存储一系列有效值的一种方法。

cats = ('morning', 'evening', 'night')

maps = {k: df.dropna(subset=[k]).drop_duplicates('id_label').set_index('id_label')[k] \
           for k in cats}

for col in cats:
    df[col] = df[col].fillna(df['id_label'].map(maps[col]))

print(df)

   index_label id_label morning  evening   night
0            a        x  shower   eating   sleep
1            b        x  shower   eating   sleep
2            c        x  shower   eating   sleep
3            d        y    work  reading  travel
4            e        y    work  reading  travel
5            f        y    work  reading  travel
6            g        z  shower   eating  travel
7            h        z  shower   eating  travel