我想像这样合并两个大数据框:
loc val
2019-09-01 0 23.2
2019-09-02 0 13.2
...
2019-11-01 0 12.9
2019-09-01 1 21.2
2019-09-01 1 26.7
...
2019-11-01 1 13.5
...
2019-09-01 4 23.4
...
2019-11-01 4 17.8
因此,换句话说,作为索引,我每个loc
都有很多日期时间,loc
的范围是0到4。
我有2个这些数据框。我想同时通过loc
列加入它们,但我想以一种内部方式考虑索引。因此,如果我有第二个数据帧:
loc val
2019-09-02 0 54.8
2019-09-03 0 11.7
...
因此合并将类似于:
loc val val
2019-09-01 0 23.2 NaN
2019-09-02 0 13.2 54.8
...
您知道这是否可能吗?我想要这样的东西(有可能):
df = pd.merge(df1, df2, on="loc", left_index=True, right_index=True)
我一直在用merge
进行测试,但是我不知道该怎么做。谢谢。
答案 0 :(得分:3)
IIUC,
我们可以将轴重命名为一个通用的索引名称,我尝试在空白索引上进行合并,但是我无法弄清楚,
然后,我们合并到您的'loc'
列+新命名的'date'
索引上。
您听起来好像知道合并,所以请更改行为以符合您的要求。
df.rename_axis('date',inplace=True)
df1.rename_axis('date',inplace=True)
pd.merge(df,df1,on=['loc','date'],how='left',indicator=True)
out:
loc val_x val_y _merge
date
2019-09-01 0.0 23.2 NaN left_only
2019-09-02 0.0 13.2 54.8 both
2019-11-01 0.0 12.9 NaN left_only
2019-09-01 1.0 21.2 NaN left_only
2019-09-01 1.0 26.7 NaN left_only
2019-11-01 1.0 13.5 NaN left_only
2019-09-01 4.0 23.4 NaN left_only
2019-11-01 4.0 17.8 NaN left_only
答案 1 :(得分:2)
您可以尝试以下操作:
df_1 = df_1.reset_index().rename(columns={'index':'dates'}) #Creates columns from the index, and then rename it to `dates`
df_2 = df_2.reset_index().rename(columns={'index':'dates'}) #Same as first line
df_output = df_1.merge(df_2,how='inner',left_on=['loc','dates'],right_on=['loc','dates']) #Finally perform the inner join based on both columns.
这将导致所需的输出。我正在创建示例集以更好地说明它。
import pandas as pd
d_1 = {'index':['2019-09-02','2019-09-03'],'loc':[0,0],'val':[23.2,13.2]}
d_2 = {'index':['2019-09-02','2019-09-03','2019-09-05'],'loc':[0,0,0],'val':[54.8,10,13]}
df_1 = pd.DataFrame(d_1)
df_2 = pd.DataFrame(d_2)
df_1 = df_1.set_index('index') #This is your data
df_2 = df_2.set_index('index') #This is your data
print(df_1)
print(df_2)
df_1 = df_1.reset_index().rename(columns={'index':'dates'})
df_2 = df_2.reset_index().rename(columns={'index':'dates'})
final_df = df_2.merge(df_1,how='inner',left_on=['dates','loc'],right_on=['dates','loc'])
print(final_df)
这是输出:
dates loc val_x val_y
0 2019-09-02 0 54.8 23.2
1 2019-09-03 0 10.0 13.2
对于您的预期输出以及给定的信息,left
联接将更容易地满足要求。有了这些信息:
d_1 = {'index':['2019-09-01','2019-09-02'],'loc':[0,0],'val':[23.2,13.2]}
d_2 = {'index':['2019-09-02','2019-09-03'],'loc':[0,0],'val':[54.8,11.7]}
final_df = df_2.merge(df_1,how='left',left_on=['dates','loc'],right_on=['dates','loc'])
print(final_df)
输出:
dates loc val_x val_y
0 2019-09-02 0 54.8 13.2
1 2019-09-03 0 11.7 NaN