Question

我有两个pandas数据框，我想合并/加入

例如：

#required packages
import os
import pandas as pd
import numpy as np
import datetime as dt

# create sample time series
dates1 = pd.date_range('1/1/2000', periods=4, freq='10min')
dates2 = dates1
column_names = ['A','B','C']
df1 = pd.DataFrame(np.random.randn(4, 3), index=dates1, 
columns=column_names)
df2 = pd.DataFrame(np.random.randn(4, 3), index=dates2, 
columns=column_names)

df3 = df1.merge(df2, how='outer', left_index=True, right_index=True, suffixes=('_x', '_y'))

从这里开始，我想以下列方式合并两个数据集（注意列的顺序）：

                                              A_x       A_y       B_x       B_y       C_x       C_y
2000-01-01 00:00:00 2000-01-01 00:00:00 -0.572616 -0.867554 -0.382594  1.866238 -0.756318  0.564087
2000-01-01 00:10:00 2000-01-01 00:10:00 -0.814776 -0.458378  1.011491  0.196498 -0.523433 -0.296989
2000-01-01 00:20:00 2000-01-01 00:20:00 -0.617766  0.081141  1.405145 -1.183592  0.400720 -0.872507
2000-01-01 00:30:00 2000-01-01 00:30:00  1.083721  0.137422 -1.013840 -1.610531 -1.258841  0.142301

我想通过创建多索引数据框或为第二个索引创建列来保留两个数据框索引。使用merge_ordered而不是合并或加入会更容易吗？

感谢任何帮助。

Answer 1

我认为你想要concat而不是合并：

In [11]: pd.concat([df1, df2], keys=["df1", "df2"], axis=1)
Out[11]:
                          df1                           df2
                            A         B         C         A         B         C
2000-01-01 00:00:00  1.621737  0.093015 -0.698715  0.319212  1.021829  1.707847
2000-01-01 00:10:00  0.780523 -1.169127 -1.097695 -0.444000  0.170283  1.652005
2000-01-01 00:20:00  1.560046 -0.196604 -1.260149  0.725005 -1.290074  0.606269
2000-01-01 00:30:00 -1.074419 -2.488055 -0.548531 -1.046327  0.895894  0.423743

Answer 2

使用concat

pd.concat([df1.reset_index().add_suffix('_x'),\ 
df2.reset_index().add_suffix('_y')], axis = 1)\
.set_index(['index_x', 'index_y'])

                                         A_x        B_x         C_x         A_y         B_y         C_y
index_x             index_y                     
2000-01-01 00:00:00 2000-01-01 00:00:00 -1.437311   -1.414127   0.344057    -0.533669   -0.260106   -1.316879
2000-01-01 00:10:00 2000-01-01 00:10:00 0.662025    1.860933    -0.485169   -0.825603   -0.973267   -0.760737
2000-01-01 00:20:00 2000-01-01 00:20:00 -0.300213   0.047812    -2.279631   -0.739694   -1.872261   2.281126
2000-01-01 00:30:00 2000-01-01 00:30:00 1.499468    0.633967    -1.067881   0.174793    1.197813    -0.879132

Answer 3

merge确实会合并两个指数。

您可以在合并之前在df2中创建额外列：

df2["index_2"]=df2.index

这将在最终结果中创建一个列，该列将是df2中索引的值。

请注意，此列与索引不同的唯一情况是该元素未显示在df2中，在这种情况下它将为null，因此我不确定我是否了解您的最终目标在这。

将两个pandas数据帧与timeseries索引合并

3 个答案: