这是我的原始数据框df _:
index_label,id_label,morning,evening,night
a,x,nan,eating,sleep
b,x,shower,eating,nan
c,x,nan,nan,nan
d,y,work,reading,travel
e,y,nan,reading,nan
f,y,work,nan,nan
g,z,shower,nan,travel
h,z,shower,eating,nan
然后我尝试使用基于相同id_labels的相同数据帧df中的非值替换nan值。每个“早晨”,“晚上”列都需要从nan中清除。 “晚上”列应保持不变。
例如,我将其写在“早晨”列中
crit_nan_ = pd.isna(df_[['morning']])
df_nan_ = df_.loc[crit_nan_]
df_clean_ = df_.loc[~crit_nan_]
但是接下来我如何到达结果数据框:
index_label,id_label,morning,evening,night
a,x,shower,eating,sleep
b,x,shower,eating,nan
c,x,shower,eating,nan
d,y,work,reading,travel
e,y,work,reading,nan
f,y,work,reading,nan
g,z,shower,eating,travel
h,z,shower,eating,nan
答案 0 :(得分:2)
可以使用df.groupby
和df.fillna
获得结果数据帧:
def fill_na(x):
return x.fillna(method="ffill").fillna(method="bfill")
for col in ("morning", "evening", ):
d[col] = d.groupby("id_label")[col].transform(fill_na)
答案 1 :(得分:0)
这是使用字典存储一系列有效值的一种方法。
cats = ('morning', 'evening', 'night')
maps = {k: df.dropna(subset=[k]).drop_duplicates('id_label').set_index('id_label')[k] \
for k in cats}
for col in cats:
df[col] = df[col].fillna(df['id_label'].map(maps[col]))
print(df)
index_label id_label morning evening night
0 a x shower eating sleep
1 b x shower eating sleep
2 c x shower eating sleep
3 d y work reading travel
4 e y work reading travel
5 f y work reading travel
6 g z shower eating travel
7 h z shower eating travel