Question

我有两个数据框，每个数据框代表一个不规则的时间序列。

以下是来自df1的示例：

index   
2014-10-30 16:00    118
2014-10-30 19:00    160
2014-10-30 22:00    88
2014-10-31 00:00    128
2014-10-31 03:00    89
2014-10-31 11:00    66
2014-10-31 17:00    84
2014-10-31 20:00    104
2014-10-31 21:00    82
2014-10-31 23:00    95
2014-11-01 02:00    44
2014-11-01 03:00    54
2014-11-01 14:00    83
2014-11-02 03:00    78
2014-11-02 04:00    87
2014-11-02 13:00    90

以下是来自df2的示例：

index   
2016-02-04 02:00    0.00
2016-02-06 00:00    50.00
2016-02-07 05:00    30.00
2016-02-07 21:00    26.00
2016-02-10 18:00    100.00
2016-02-11 00:00    20.00
2016-02-12 03:00    15.00
2016-02-12 18:00    90.00
2016-02-13 17:00    25.00
2016-02-13 19:00    40.00
2016-02-15 00:00    35.00
2016-02-18 04:00    14.00
2016-02-28 00:00    33.98

索引是pandas具有每小时频率的Period对象，并且由两个数据帧的索引表示的时间范围肯定有一些重叠。我如何将它们合并到一个数据框架中，该数据框架通过其索引的并集进行索引并留下空白（后来我可以应用ffill），其中一列缺少特定索引的值？

这是我尝试的内容：

df1.merge(df2, how = 'outer')

这给了我一个看似失败指数的荒谬结果：

我也尝试过：

df1.merge(df2, how = 'outer', left_on = 'index', right_on = 'index')

这给了我一个KeyError：

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()

KeyError: 'index'

最后，我尝试在resampling每个数据框之后用字典创建一个新的数据框：

df_1 = df1.resample('H').ffill()
df_2 = df2.resample('H').ffill()

fin = pd.DataFrame({'d1':df_1[0], 'd2':df_2[0]})

但是这会生成d2列完全NaN的输出，即使原始df_2帖子重新采样看起来很好。

如何进行合并？

Answer 1

在这种情况下，请尝试merge，而不是join，因为它会保留索引：

df1.join(df2, how='outer')

在这种情况下，它不应该需要任何其他配置。加入outer只会在列中缺少索引值的任何地方留下NaN值。

Answer 2

另外两个选择：

1）使用merge

df = df1.merge(df2, left_index=True, right_index=True, how='outer')

2）使用append，因为两个dfs具有完全相同的列，然后删除重复的行

df = df1.append(df2).drop_duplicates()

从具有重叠但不相同的句点索引的两个数据帧创建数据框

2 个答案: