如何使用熊猫在共享值上将这两个DataFrame联接在一起?

时间:2018-07-11 13:46:11

标签: python python-2.7 pandas

我有一个类似以下(df1)-

的数据框
   read_year  read_month      load  trading_block
0       2017           3  0.019582              0
1       2017           3  0.019460              0
2       2017           3  0.018888              0
3       2017           3  0.018940              0
4       2017           3  0.019114              0

还有其他类似以下内容的{df2)-

   read_year  read_month      lmp  trading_block
0       2009           1  37.5694              0
1       2009           1  34.5777              0
2       2009           1  33.7039              0
3       2009           1  33.1503              0
4       2009           1  33.8935              0

我想要的是让df2仅在匹配read_year的年份进行合并/合并/合并(无论哪个可行)。

预期输出应如下所示-

   read_year  read_month      load  trading_block       lmp
0       2017           3  0.019582              0   32.1201
1       2017           3  0.019460              0   12.1230
2       2017           3  0.018888              0   40.2941
3       2017           3  0.018940              0   20.3918
4       2017           3  0.019114              0   50.9371

我如何轻松地做到这一点?

1 个答案:

答案 0 :(得分:1)

我认为需要merge,但需要帮助者列来按GroupBy.cumcount计数重复项,还需要按子集指定列:

#changed years for match data
print (df2)
   read_year  read_month      lmp  trading_block
0       2009           1  37.5694              0
1       2009           1  34.5777              0
2       2017           1  33.7039              0
3       2017           1  33.1503              0
4       2017           1  33.8935              0

df1['g'] = df1.groupby('read_year').cumcount()
df2['g'] = df2.groupby('read_year').cumcount()

#need columns for join in subset + columns for add - here lmp column
df = df1.merge(df2[['read_year','g','lmp']],on=['read_year', 'g']).drop('g', axis=1)
print (df)
   read_year  read_month      load  trading_block      lmp
0       2017           3  0.019582              0  33.7039
1       2017           3  0.019460              0  33.1503
2       2017           3  0.018888              0  33.8935