我有三个pandas数据帧,其中包含测试期间记录的数据。一帧用于温度,另一帧用于真空,另一帧用于电压。
独立捕获数据,因此每个帧的时间值不对齐。只是偶尔会有一帧中的时间戳在另一帧中有重复。
我想要做的是将这些组合成一个数据帧,然后插入缺失值,以便我有一个完整的数据帧。
我是大熊猫的新手并一直在四处寻找,但我觉得我没有任何地方,或者我是否走在正确的道路上。
答案 0 :(得分:5)
import pandas as pd
import numpy as np
rng1 = pd.date_range(
'1/1/2012',
periods=10,
freq='H'
)
s1 = pd.Series(
np.arange(10),
index=rng1
)
df1 = pd.DataFrame(
{'temp': s1}
)
s2 = pd.Series(
np.arange(5, 10),
index=['1/1/2012 01:20:00',
'1/1/2012 01:40:00',
'1/1/2012 02:00:00',
'1/1/2012 05:30:00',
'1/1/2012 06:00:00']
)
df2 = pd.DataFrame(
{'voltage': s2},
)
print df1
print df2
--output:--
temp
2012-01-01 00:00:00 0
2012-01-01 01:00:00 1
2012-01-01 02:00:00 2
2012-01-01 03:00:00 3
2012-01-01 04:00:00 4
2012-01-01 05:00:00 5
2012-01-01 06:00:00 6
2012-01-01 07:00:00 7
2012-01-01 08:00:00 8
2012-01-01 09:00:00 9
voltage
1/1/2012 01:20:00 5
1/1/2012 01:40:00 6
1/1/2012 02:00:00 7
1/1/2012 05:30:00 8
1/1/2012 06:00:00 9
combined = df1.join(df2, how='outer')
print combined
--output:--
temp voltage
2012-01-01 00:00:00 0 NaN
2012-01-01 01:00:00 1 NaN
2012-01-01 01:20:00 NaN 5
2012-01-01 01:40:00 NaN 6
2012-01-01 02:00:00 2 7
2012-01-01 03:00:00 3 NaN
2012-01-01 04:00:00 4 NaN
2012-01-01 05:00:00 5 NaN
2012-01-01 05:30:00 NaN 8
2012-01-01 06:00:00 6 9
2012-01-01 07:00:00 7 NaN
2012-01-01 08:00:00 8 NaN
2012-01-01 09:00:00 9 NaN
combined = combined.apply(
pd.Series.interpolate,
args=('time',)
)
print combined
--output:--
temp voltage
2012-01-01 00:00:00 0.000000 NaN
2012-01-01 01:00:00 1.000000 NaN
2012-01-01 01:20:00 1.333333 5.000000
2012-01-01 01:40:00 1.666667 6.000000
2012-01-01 02:00:00 2.000000 7.000000
2012-01-01 03:00:00 3.000000 7.285714
2012-01-01 04:00:00 4.000000 7.571429
2012-01-01 05:00:00 5.000000 7.857143
2012-01-01 05:30:00 5.500000 8.000000
2012-01-01 06:00:00 6.000000 9.000000
2012-01-01 07:00:00 7.000000 9.000000
2012-01-01 08:00:00 8.000000 9.000000
2012-01-01 09:00:00 9.000000 9.000000
print combined.fillna(method='backfill')
--output:--
temp voltage
2012-01-01 00:00:00 0.000000 5.000000
2012-01-01 01:00:00 1.000000 5.000000
2012-01-01 01:20:00 1.333333 5.000000
2012-01-01 01:40:00 1.666667 6.000000
2012-01-01 02:00:00 2.000000 7.000000
2012-01-01 03:00:00 3.000000 7.285714
2012-01-01 04:00:00 4.000000 7.571429
2012-01-01 05:00:00 5.000000 7.857143
2012-01-01 05:30:00 5.500000 8.000000
2012-01-01 06:00:00 6.000000 9.000000
2012-01-01 07:00:00 7.000000 9.000000
2012-01-01 08:00:00 8.000000 9.000000
2012-01-01 09:00:00 9.000000 9.000000