我的DF看起来像这样
id zip location
X2 65123 Houston
T5 65123 Houston
A1 nan Houston
M8 89517 Berkley
X3 89518 Berkley
N2 nan Berkley
M9 nan nan
对于“ zip”中的某些值,我没有邮政编码,但在“ location”中有一个条目。
我想用来自同一位置的邮政编码之一来填充“ zip”中的nan值。有时有不止一种选择,例如对于N2,有两种选择89517和89518,选择哪一种并不重要。但是,我不想更改在邮编和位置中有nan的位置。我该怎么办?
答案 0 :(得分:1)
由于您不在乎要使用哪个值,因此我们可以使用max
值:
>>> df['zip'] = df.groupby('location')['zip'].transform(lambda x: x.fillna(x.max())).astype(int)
>>> df
id zip location
0 X2 65123 Houston
1 T5 65123 Houston
2 A1 65123 Houston
3 M8 89517 Berkley
4 X3 89518 Berkley
5 N2 89518 Berkley
如果您需要处理zip
和location
均为NaN
的情况,请首先过滤掉子组:
>>> sub_df = df.loc[df[['zip', 'location']].notna().any(1)]
>>> df
id zip location
0 X2 65123.0 Houston
1 T5 65123.0 Houston
2 A1 NaN Houston
3 M7 NaN NaN # <-- added a line in between to show index is maintained
4 M8 89517.0 Berkley
5 X3 89518.0 Berkley
6 N2 NaN Berkley
7 M9 NaN NaN
>>> sub_df
id zip location
0 X2 65123.0 Houston
1 T5 65123.0 Houston
2 A1 NaN Houston # <-- No index 3
4 M8 89517.0 Berkley
5 X3 89518.0 Berkley
6 N2 NaN Berkley
然后执行相同的操作(只是这一次您不必强制转换为int
,因为您的框架中仍然会有NaN
个字):
df['zip'] = sub_df.groupby('location')['zip'].transform(lambda x: x.fillna(x.max()))
结果:
id zip location
0 X2 65123.0 Houston
1 T5 65123.0 Houston
2 A1 65123.0 Houston
3 M7 NaN NaN
4 M8 89517.0 Berkley
5 X3 89518.0 Berkley
6 N2 89518.0 Berkley
7 M9 NaN NaN
答案 1 :(得分:0)
如果您不关心要填写哪个值,一种简单的方法是按位置和邮政编码对表格进行排序,然后将fillna与method ='ffill'配合使用
>>> df
zip location
0 65123.0 Houston
1 65123.0 Houston
2 NaN Houston
3 89517.0 Berkley
4 89518.0 Berkley
5 NaN Berkley
>>> df.sort_values(by=['location','zip']).fillna(method='ffill')
zip location
3 89517.0 Berkley
4 89518.0 Berkley
5 89518.0 Berkley
0 65123.0 Houston
1 65123.0 Houston
2 65123.0 Houston
更新:下面的解决方案也在本地处理nan。首先使用groupby函数,然后在组内通过max填充。
>>> df
zip location
0 65123.0 Houston
1 65123.0 Houston
2 NaN Houston
3 89517.0 Berkley
4 89518.0 Berkley
5 NaN Berkley
6 NaN NaN
>>> df['zip'] = df.groupby('location')['zip'].apply(lambda x:x.fillna(x.max()))
>>> df
zip location
0 65123.0 Houston
1 65123.0 Houston
2 65123.0 Houston
3 89517.0 Berkley
4 89518.0 Berkley
5 89518.0 Berkley
6 NaN NaN