但是：

Question

我想像这样合并两个大数据框：

            loc  val
2019-09-01  0    23.2
2019-09-02  0    13.2
...
2019-11-01  0    12.9
2019-09-01  1    21.2
2019-09-01  1    26.7
...
2019-11-01  1    13.5
...
2019-09-01  4    23.4
...
2019-11-01  4    17.8

因此，换句话说，作为索引，我每个loc都有很多日期时间，loc的范围是0到4。

我有2个这些数据框。我想同时通过loc列加入它们，但我想以一种内部方式考虑索引。因此，如果我有第二个数据帧：

            loc  val
2019-09-02  0    54.8
2019-09-03  0    11.7
...

因此合并将类似于：

            loc  val    val
2019-09-01  0    23.2   NaN
2019-09-02  0    13.2   54.8
...

您知道这是否可能吗？我想要这样的东西（有可能）：

df = pd.merge(df1, df2, on="loc", left_index=True, right_index=True)

我一直在用merge进行测试，但是我不知道该怎么做。谢谢。

Answer 1

IIUC，

我们可以将轴重命名为一个通用的索引名称，我尝试在空白索引上进行合并，但是我无法弄清楚，

然后，我们合并到您的'loc'列+新命名的'date'索引上。

您听起来好像知道合并，所以请更改行为以符合您的要求。

df.rename_axis('date',inplace=True)
df1.rename_axis('date',inplace=True)
pd.merge(df,df1,on=['loc','date'],how='left',indicator=True)
out:


           loc  val_x  val_y     _merge
date                                    
2019-09-01  0.0   23.2    NaN  left_only
2019-09-02  0.0   13.2   54.8       both
2019-11-01  0.0   12.9    NaN  left_only
2019-09-01  1.0   21.2    NaN  left_only
2019-09-01  1.0   26.7    NaN  left_only
2019-11-01  1.0   13.5    NaN  left_only
2019-09-01  4.0   23.4    NaN  left_only
2019-11-01  4.0   17.8    NaN  left_only

Answer 2

您可以尝试以下操作：

df_1 = df_1.reset_index().rename(columns={'index':'dates'}) #Creates columns from the index, and then rename it to `dates`
df_2 = df_2.reset_index().rename(columns={'index':'dates'}) #Same as first line

df_output = df_1.merge(df_2,how='inner',left_on=['loc','dates'],right_on=['loc','dates']) #Finally perform the inner join based on both columns.

这将导致所需的输出。我正在创建示例集以更好地说明它。

import pandas as pd
d_1 = {'index':['2019-09-02','2019-09-03'],'loc':[0,0],'val':[23.2,13.2]}
d_2 = {'index':['2019-09-02','2019-09-03','2019-09-05'],'loc':[0,0,0],'val':[54.8,10,13]}
df_1 = pd.DataFrame(d_1)
df_2 = pd.DataFrame(d_2)
df_1 = df_1.set_index('index') #This is your data
df_2 = df_2.set_index('index') #This is your data
print(df_1)
print(df_2)
df_1 = df_1.reset_index().rename(columns={'index':'dates'})
df_2 = df_2.reset_index().rename(columns={'index':'dates'})

final_df = df_2.merge(df_1,how='inner',left_on=['dates','loc'],right_on=['dates','loc'])
print(final_df)

这是输出：

        dates  loc  val_x  val_y
0  2019-09-02    0   54.8   23.2
1  2019-09-03    0   10.0   13.2

但是：

对于您的预期输出以及给定的信息，left联接将更容易地满足要求。有了这些信息：

d_1 = {'index':['2019-09-01','2019-09-02'],'loc':[0,0],'val':[23.2,13.2]}
d_2 = {'index':['2019-09-02','2019-09-03'],'loc':[0,0],'val':[54.8,11.7]}
final_df = df_2.merge(df_1,how='left',left_on=['dates','loc'],right_on=['dates','loc'])
print(final_df)

输出：

        dates  loc  val_x  val_y
0  2019-09-02    0   54.8   13.2
1  2019-09-03    0   11.7    NaN

基于公共列和索引的熊猫合并

2 个答案:

但是：