结合不同采样率的熊猫数据帧

时间:2013-09-25 16:48:51

标签: python pandas

我有三个pandas数据帧,其中包含测试期间记录的数据。一帧用于温度,另一帧用于真空,另一帧用于电压。

独立捕获数据,因此每个帧的时间值不对齐。只是偶尔会有一帧中的时间戳在另一帧中有重复。

我想要做的是将这些组合成一个数据帧,然后插入缺失值,以便我有一个完整的数据帧。

我是大熊猫的新手并一直在四处寻找,但我觉得我没有任何地方,或者我是否走在正确的道路上。

1 个答案:

答案 0 :(得分:5)

import pandas as pd
import numpy as np

rng1 = pd.date_range(
    '1/1/2012', 
    periods=10, 
    freq='H'
)

s1 = pd.Series(
    np.arange(10),
    index=rng1
)

df1 = pd.DataFrame(
    {'temp': s1}
)

s2 = pd.Series(
    np.arange(5, 10),
    index=['1/1/2012 01:20:00',
           '1/1/2012 01:40:00',
           '1/1/2012 02:00:00',
           '1/1/2012 05:30:00',
           '1/1/2012 06:00:00']
)

df2 = pd.DataFrame(
    {'voltage': s2},
)

print df1
print df2 

--output:--
                     temp
2012-01-01 00:00:00     0
2012-01-01 01:00:00     1
2012-01-01 02:00:00     2
2012-01-01 03:00:00     3
2012-01-01 04:00:00     4
2012-01-01 05:00:00     5
2012-01-01 06:00:00     6
2012-01-01 07:00:00     7
2012-01-01 08:00:00     8
2012-01-01 09:00:00     9

                   voltage
1/1/2012 01:20:00        5
1/1/2012 01:40:00        6
1/1/2012 02:00:00        7
1/1/2012 05:30:00        8
1/1/2012 06:00:00        9


combined = df1.join(df2, how='outer')
print combined

--output:--
                     temp  voltage
2012-01-01 00:00:00     0      NaN
2012-01-01 01:00:00     1      NaN
2012-01-01 01:20:00   NaN        5
2012-01-01 01:40:00   NaN        6
2012-01-01 02:00:00     2        7
2012-01-01 03:00:00     3      NaN
2012-01-01 04:00:00     4      NaN
2012-01-01 05:00:00     5      NaN
2012-01-01 05:30:00   NaN        8
2012-01-01 06:00:00     6        9
2012-01-01 07:00:00     7      NaN
2012-01-01 08:00:00     8      NaN
2012-01-01 09:00:00     9      NaN

combined = combined.apply(
    pd.Series.interpolate, 
    args=('time',) 
)

print combined

--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000       NaN
2012-01-01 01:00:00  1.000000       NaN
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000

print combined.fillna(method='backfill')

--output:--
                         temp   voltage
2012-01-01 00:00:00  0.000000  5.000000
2012-01-01 01:00:00  1.000000  5.000000
2012-01-01 01:20:00  1.333333  5.000000
2012-01-01 01:40:00  1.666667  6.000000
2012-01-01 02:00:00  2.000000  7.000000
2012-01-01 03:00:00  3.000000  7.285714
2012-01-01 04:00:00  4.000000  7.571429
2012-01-01 05:00:00  5.000000  7.857143
2012-01-01 05:30:00  5.500000  8.000000
2012-01-01 06:00:00  6.000000  9.000000
2012-01-01 07:00:00  7.000000  9.000000
2012-01-01 08:00:00  8.000000  9.000000
2012-01-01 09:00:00  9.000000  9.000000