我有以下数据框df
:
time col_A
0 1520582580.000 79.000
1 1520582880.000 22.500
2 1520583180.000 29.361
3 1520583480.000 116.095
4 1520583780.000 19.972
5 1520584080.000 36.857
6 1520584380.000 15.167
7 1520584680.000 nan
8 1520584980.000 nan
9 1520585280.000 nan
10 1520585580.000 34.500
11 1520585880.000 17.583
12 1520586180.000 nan
13 1520586480.000 48.833
14 1520586780.000 18.806
15 1520587080.000 18.583
col_A
有一些缺失的数据。我想创建一个col_B
,它为每个丢失的记录取以前的值。即。
6 1520584380.000 15.167
7 1520584680.000 15.167
8 1520584980.000 15.167
9 1520585280.000 15.167
10 1520585580.000 34.500
11 1520585880.000 17.583
12 1520586180.000 17.583
13 1520586480.000 48.833
和col_C
,使用最接近的非缺失点之前和之后进行插值。即。
6 1520584380.000 15.167
7 1520584680.000 20.001
8 1520584980.000 24.834
9 1520585280.000 29.667
10 1520585580.000 34.500
11 1520585880.000 17.583
12 1520586180.000 33.208
13 1520586480.000 48.833
除了循环数据帧以按记录执行计算记录外,是否有一个内置函数可以用来以优雅的方式实现这一点?谢谢!
答案 0 :(得分:3)
我认为ffill
需要df['colB'] = df['col_A'].ffill()
df['colc'] = df['col_A'].interpolate()
print (df)
time col_A colB colc
0 1.520583e+09 79.000 79.000 79.00000
1 1.520583e+09 22.500 22.500 22.50000
2 1.520583e+09 29.361 29.361 29.36100
3 1.520583e+09 116.095 116.095 116.09500
4 1.520584e+09 19.972 19.972 19.97200
5 1.520584e+09 36.857 36.857 36.85700
6 1.520584e+09 15.167 15.167 15.16700
7 1.520585e+09 NaN 15.167 20.00025
8 1.520585e+09 NaN 15.167 24.83350
9 1.520585e+09 NaN 15.167 29.66675
10 1.520586e+09 34.500 34.500 34.50000
11 1.520586e+09 17.583 17.583 17.58300
12 1.520586e+09 NaN 17.583 33.20800
13 1.520586e+09 48.833 48.833 48.83300
14 1.520587e+09 18.806 18.806 18.80600
15 1.520587e+09 18.583 18.583 18.58300
:
time
如果想要使用方法df['time'] = pd.to_datetime(df['time'], unit='s')
df = df.set_index('time')
df['colB'] = df['col_A'].ffill()
df['colc'] = df['col_A'].interpolate('time')
print (df)
col_A colB colc
time
2018-03-09 08:03:00 79.000 79.000 79.00000
2018-03-09 08:08:00 22.500 22.500 22.50000
2018-03-09 08:13:00 29.361 29.361 29.36100
2018-03-09 08:18:00 116.095 116.095 116.09500
2018-03-09 08:23:00 19.972 19.972 19.97200
2018-03-09 08:28:00 36.857 36.857 36.85700
2018-03-09 08:33:00 15.167 15.167 15.16700
2018-03-09 08:38:00 NaN 15.167 20.00025
2018-03-09 08:43:00 NaN 15.167 24.83350
2018-03-09 08:48:00 NaN 15.167 29.66675
2018-03-09 08:53:00 34.500 34.500 34.50000
2018-03-09 08:58:00 17.583 17.583 17.58300
2018-03-09 09:03:00 NaN 17.583 33.20800
2018-03-09 09:08:00 48.833 48.833 48.83300
2018-03-09 09:13:00 18.806 18.806 18.80600
2018-03-09 09:18:00 18.583 18.583 18.58300
进行插值:
array#reduce