我已计算出'邻居'分组的'LotFrontage'的平均值为一个名为LotFrontage_mean的DF。它看起来像这样
Neighborhood LotFrontage
0 Blmngtn 47.142857
1 Blueste 24.000000
2 BrDale 21.562500
3 BrkSide 57.509804
4 ClearCr 83.461538
5 CollgCr 71.682540
6 Crawfor 71.804878
7 Edwards 68.217391
8 Gilbert 79.877551
9 IDOTRR 62.500000
10 MeadowV 27.800000
11 Mitchel 70.083333
在我最初的DF中,我想用'LotFrontage'中的NA来填充来自各自社区的LotFrontage的平均值。例如,我希望LotFrontage列中邻居Blmngtn的所有NA都是47.142857。
这就是我试过的
house_df['LotFrontage'] = house_df[['Neighborhood','LotFrontage']].apply(lambda x: x['LotFrontage'] if x['LotFrontage'].notnull() else LotFrontage_mean.at(x['Neighborhood']))
请帮忙
答案 0 :(得分:1)
您似乎不需要帮助LotFrontage_mean
,您可以使用自定义函数替换NaN
或apply
中的transform
:
house_df['LotFrontage'] = house_df.groupby('Neighborhood')['LotFrontage']
.apply(lambda x: x.fillna(x.mean()))
或者:
house_df['LotFrontage'] = house_df.groupby('Neighborhood')['LotFrontage']
.transform(lambda x: x.fillna(x.mean()))
如果无法使用此解决方案,请使用combine_first
与map
:
mapped = house_df['Neighborhood'].map(LotFrontage)
house_df['LotFrontage'] = house_df['LotFrontage'].combine_first(mapped)
样品:
n = ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr']
a = [0.1,0.2,0.3,np.nan]
N = 100
house_df = pd.DataFrame({'Neighborhood': np.random.choice(n, size=N),
'LotFrontage':np.random.choice(a, size=N)},
columns=['Neighborhood','LotFrontage'])
print (house_df.head(10))
Neighborhood LotFrontage
0 BrkSide 0.1
1 CollgCr NaN
2 BrkSide NaN
3 Blueste 0.3
4 BrDale NaN
5 ClearCr 0.3
6 BrDale 0.1
7 ClearCr 0.2
8 CollgCr NaN
9 ClearCr 0.1
LotFrontage = house_df.groupby('Neighborhood')['LotFrontage'].mean()
print (LotFrontage)
Neighborhood
Blmngtn 0.200000
Blueste 0.221429
BrDale 0.180000
BrkSide 0.193333
ClearCr 0.223529
CollgCr 0.208333
Name: LotFrontage, dtype: float64
house_df['LotFrontage'] = house_df.groupby('Neighborhood')['LotFrontage'] \
.apply(lambda x: x.fillna(x.mean()))
print (house_df.head(10))
Neighborhood LotFrontage
0 BrkSide 0.100000
1 CollgCr 0.208333
2 BrkSide 0.193333
3 Blueste 0.300000
4 BrDale 0.180000
5 ClearCr 0.300000
6 BrDale 0.100000
7 ClearCr 0.200000
8 CollgCr 0.208333
9 ClearCr 0.100000
mapped = house_df['Neighborhood'].map(LotFrontage)
house_df['LotFrontage'] = house_df['LotFrontage'].combine_first(mapped)
print (house_df.head(10))
Neighborhood LotFrontage
0 BrkSide 0.100000
1 CollgCr 0.208333
2 BrkSide 0.193333
3 Blueste 0.300000
4 BrDale 0.180000
5 ClearCr 0.300000
6 BrDale 0.100000
7 ClearCr 0.200000
8 CollgCr 0.208333
9 ClearCr 0.100000