我尝试用NaN
和means
值中的before
填充after
单元。
type date v1 v2
0 a 2018-09 21511.11 17696.8
1 a 2018-10 NaN NaN
2 a 2018-11 NaN NaN
3 a 2018-12 30319.98 24553.6
4 a 2019-01 NaN NaN
5 a 2019-02 NaN NaN
6 a 2019-03 7409.61 6110.0
7 a 2019-04 NaN NaN
8 a 2019-05 NaN NaN
9 a 2019-06 15212.51 12590.5
10 a 2019-07 NaN NaN
11 a 2019-08 NaN NaN
12 a 2019-09 23129.96 19160.9
13 a 2019-10 NaN NaN
14 a 2019-11 NaN NaN
15 b 2018-09 21511.11 17696.8
16 b 2018-10 NaN NaN
17 b 2018-11 NaN NaN
18 b 2018-12 30319.98 24553.6
19 b 2019-01 NaN NaN
20 b 2019-02 NaN NaN
21 b 2019-03 7409.61 6110.0
22 b 2019-04 NaN NaN
23 b 2019-05 NaN NaN
24 b 2019-06 15212.51 12590.5
25 b 2019-07 NaN NaN
26 b 2019-08 NaN NaN
27 b 2019-09 23129.96 19160.9
28 b 2019-10 NaN NaN
29 b 2019-11 NaN NaN
我尝试从here参考以下代码:
df[['v1', 'v2']] = (df[['v1', 'v2']].ffill()+df[['v1', 'v2']].bfill())/2
df[['v1', 'v2']] = df[['v1', 'v2']].bfill().ffill()
我得到:
type date v1 v2
0 a 2018-09 21511.110 17696.80
1 a 2018-10 25915.545 21125.20
2 a 2018-11 25915.545 21125.20
3 a 2018-12 30319.980 24553.60
4 a 2019-01 18864.795 15331.80
5 a 2019-02 18864.795 15331.80
6 a 2019-03 7409.610 6110.00
7 a 2019-04 11311.060 9350.25
8 a 2019-05 11311.060 9350.25
9 a 2019-06 15212.510 12590.50
10 a 2019-07 19171.235 15875.70
11 a 2019-08 19171.235 15875.70
12 a 2019-09 23129.960 19160.90
13 a 2019-10 22320.535 18428.85
14 a 2019-11 22320.535 18428.85
15 b 2018-09 21511.110 17696.80
16 b 2018-10 25915.545 21125.20
17 b 2018-11 25915.545 21125.20
18 b 2018-12 30319.980 24553.60
19 b 2019-01 18864.795 15331.80
20 b 2019-02 18864.795 15331.80
21 b 2019-03 7409.610 6110.00
22 b 2019-04 11311.060 9350.25
23 b 2019-05 11311.060 9350.25
24 b 2019-06 15212.510 12590.50
25 b 2019-07 19171.235 15875.70
26 b 2019-08 19171.235 15875.70
27 b 2019-09 23129.960 19160.90
28 b 2019-10 23129.960 19160.90
29 b 2019-11 23129.960 19160.90
但是我不知道如何对type
进行分组并应用上面的代码。有人可以帮忙吗?谢谢。
答案 0 :(得分:3)
将groupby
添加到要处理的列的列表中,还使用每个组的第一个和最后一个缺失值apply
,以避免在仅存在一些NaN
的情况下从一组值替换为另一组值组中的s值:
g = df.groupby('type')['v1', 'v2']
df[['v1', 'v2']] = (g.ffill()+g.bfill())/2
df[['v1', 'v2']] = g.apply(lambda x: x.bfill().ffill())
仅数字列的解决方案:
cols = df.select_dtypes(np.number).columns
g = df.groupby('type')[cols]
df[cols] = (g.ffill()+g.bfill())/2
df[cols] = g.apply(lambda x: x.bfill().ffill())
答案 1 :(得分:2)
就像你说的那样:
df[['v1','v2']] = (df.groupby('type')[['v1','v2']]
.agg(['bfill','ffill'])
.groupby(level=0, axis=1)
.mean()
)