熊猫:使用先前的值填充nan并进行插值

时间:2018-03-17 05:27:25

标签: python-3.x pandas dataframe

我有以下数据框df

    time            col_A
0   1520582580.000  79.000
1   1520582880.000  22.500
2   1520583180.000  29.361
3   1520583480.000  116.095
4   1520583780.000  19.972
5   1520584080.000  36.857
6   1520584380.000  15.167
7   1520584680.000  nan
8   1520584980.000  nan
9   1520585280.000  nan
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  nan
13  1520586480.000  48.833
14  1520586780.000  18.806
15  1520587080.000  18.583

col_A有一些缺失的数据。我想创建一个col_B,它为每个丢失的记录取以前的值。即。

6   1520584380.000  15.167
7   1520584680.000  15.167
8   1520584980.000  15.167
9   1520585280.000  15.167
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  17.583
13  1520586480.000  48.833

col_C,使用最接近的非缺失点之前和之后进行插值。即。

6   1520584380.000  15.167
7   1520584680.000  20.001
8   1520584980.000  24.834
9   1520585280.000  29.667
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  33.208
13  1520586480.000  48.833

除了循环数据帧以按记录执行计算记录外,是否有一个内置函数可以用来以优雅的方式实现这一点?谢谢!

1 个答案:

答案 0 :(得分:3)

我认为ffill需要df['colB'] = df['col_A'].ffill() df['colc'] = df['col_A'].interpolate() print (df) time col_A colB colc 0 1.520583e+09 79.000 79.000 79.00000 1 1.520583e+09 22.500 22.500 22.50000 2 1.520583e+09 29.361 29.361 29.36100 3 1.520583e+09 116.095 116.095 116.09500 4 1.520584e+09 19.972 19.972 19.97200 5 1.520584e+09 36.857 36.857 36.85700 6 1.520584e+09 15.167 15.167 15.16700 7 1.520585e+09 NaN 15.167 20.00025 8 1.520585e+09 NaN 15.167 24.83350 9 1.520585e+09 NaN 15.167 29.66675 10 1.520586e+09 34.500 34.500 34.50000 11 1.520586e+09 17.583 17.583 17.58300 12 1.520586e+09 NaN 17.583 33.20800 13 1.520586e+09 48.833 48.833 48.83300 14 1.520587e+09 18.806 18.806 18.80600 15 1.520587e+09 18.583 18.583 18.58300

time

如果想要使用方法df['time'] = pd.to_datetime(df['time'], unit='s') df = df.set_index('time') df['colB'] = df['col_A'].ffill() df['colc'] = df['col_A'].interpolate('time') print (df) col_A colB colc time 2018-03-09 08:03:00 79.000 79.000 79.00000 2018-03-09 08:08:00 22.500 22.500 22.50000 2018-03-09 08:13:00 29.361 29.361 29.36100 2018-03-09 08:18:00 116.095 116.095 116.09500 2018-03-09 08:23:00 19.972 19.972 19.97200 2018-03-09 08:28:00 36.857 36.857 36.85700 2018-03-09 08:33:00 15.167 15.167 15.16700 2018-03-09 08:38:00 NaN 15.167 20.00025 2018-03-09 08:43:00 NaN 15.167 24.83350 2018-03-09 08:48:00 NaN 15.167 29.66675 2018-03-09 08:53:00 34.500 34.500 34.50000 2018-03-09 08:58:00 17.583 17.583 17.58300 2018-03-09 09:03:00 NaN 17.583 33.20800 2018-03-09 09:08:00 48.833 48.833 48.83300 2018-03-09 09:13:00 18.806 18.806 18.80600 2018-03-09 09:18:00 18.583 18.583 18.58300 进行插值:

array#reduce