我的任务是将许多时间序列组合成一个数据集。数据分块:具有日期和时间索引的时间序列,不同的名称,但重叠的日期和时间。合并时,我得到一个包含重复行的数据集。
我的代码:
>>> import pandas as pd
>>> s1 = pd.Series([1.1,1.2], index=pd.date_range("2000-01-01 00:00:00", freq="S", periods=2), name="id_1")
>>> s2 = pd.Series([1.3,1.4], index=pd.date_range("2000-01-01 00:00:02", freq="S", periods=2), name="id_1")
>>> s3 = pd.Series([2.1,2.2], index=pd.date_range("2000-01-01 00:00:00", freq="S", periods=2), name="id_2")
>>> s4 = pd.Series([2.3,2.4], index=pd.date_range("2000-01-01 00:00:02", freq="S", periods=2), name="id_2")
>>> df = pd.DataFrame()
>>> df.append([s1,s2,s3,s4])
2000-01-01 00:00:00 2000-01-01 00:00:01 2000-01-01 00:00:02 2000-01-01 00:00:03
id_1 1.1 1.2 NaN NaN
id_1 NaN NaN 1.3 1.4
id_2 2.1 2.2 NaN NaN
id_2 NaN NaN 2.3 2.4
我希望数据集是这样的:
2000-01-01 00:00:00 2000-01-01 00:00:01 2000-01-01 00:00:02 2000-01-01 00:00:03
id_1 1.1 1.2 1.3 1.4
id_2 2.1 2.2 2.3 2.4
答案 0 :(得分:0)
functools.reduce
/shrug 想到的第一件事
from functools import reduce
reduce(pd.DataFrame.combine_first, map(pd.Series.to_frame, [s1, s2, s3, s4])).T
2000-01-01 00:00:00 2000-01-01 00:00:01 2000-01-01 00:00:02 2000-01-01 00:00:03
id_1 1.1 1.2 1.3 1.4
id_2 2.1 2.2 2.3 2.4
dat = {}
for s in [s1, s2, s3, s4]:
for k, v in s.iteritems():
dat.setdefault(k, {})[s.name] = v
pd.DataFrame(dat)
2000-01-01 00:00:00 2000-01-01 00:00:01 2000-01-01 00:00:02 2000-01-01 00:00:03
id_1 1.1 1.2 1.3 1.4
id_2 2.1 2.2 2.3 2.4