使用多个分组的列用平均值替换数据框值。下面的快照是数据帧:
Current Loan Amount DateTime Day Month Year
0 611314 1-Jan-92 1 Jan 92
1 266662 2-Jan-92 2 Jan 92
2 153494 3-Jan-92 3 Jan 92
3 176242 4-Jan-92 4 Jan 92
4 321992 5-Jan-92 5 Jan 92
5 202928 6-Jan-92 6 Jan 92
6 621786 7-Jan-92 7 Jan 92
7 266794 8-Jan-92 8 Jan 92
8 202466 9-Jan-92 9 Jan 92
9 266288 10-Jan-92 10 Jan 92
10 121110 11-Jan-92 11 Jan 92
11 258104 12-Jan-92 12 Jan 92
12 161722 13-Jan-92 13 Jan 92
13 753016 14-Jan-92 14 Jan 92
14 444664 15-Jan-92 15 Jan 92
15 172282 16-Jan-92 16 Jan 92
16 275440 17-Jan-92 17 Jan 92
17 218834 18-Jan-92 18 Jan 92
18 0 19-Jan-92 19 Jan 92
19 0 20-Jan-92 20 Jan 92
我需要用当年和当月的当前贷款金额的平均值替换0.0值。
我使用了不同的方法,下面的方法确实给出了平均值,但它不会更改数据框并删除其余的列
data = data_loan.groupby(['Year','Month'])
def replace(group):
mask = (group==0)
group[mask] = group[~mask].mean()
return group
new_data = data.transform(replace)
答案 0 :(得分:1)
import numpy as np
data_loan['current'] = data_loan['current'].replace(0, np.nan)
data_loan["current"] = data_loan.groupby(['Month','Year'])["current"].transform(lambda x: x.fillna(x.mean()))
这将用组的平均值替换0。