我有一个类似以下(df1
)-
read_year read_month load trading_block
0 2017 3 0.019582 0
1 2017 3 0.019460 0
2 2017 3 0.018888 0
3 2017 3 0.018940 0
4 2017 3 0.019114 0
还有其他类似以下内容的{df2
)-
read_year read_month lmp trading_block
0 2009 1 37.5694 0
1 2009 1 34.5777 0
2 2009 1 33.7039 0
3 2009 1 33.1503 0
4 2009 1 33.8935 0
我想要的是让df2
仅在匹配read_year
的年份进行合并/合并/合并(无论哪个可行)。
预期输出应如下所示-
read_year read_month load trading_block lmp
0 2017 3 0.019582 0 32.1201
1 2017 3 0.019460 0 12.1230
2 2017 3 0.018888 0 40.2941
3 2017 3 0.018940 0 20.3918
4 2017 3 0.019114 0 50.9371
我如何轻松地做到这一点?
答案 0 :(得分:1)
我认为需要merge
,但需要帮助者列来按GroupBy.cumcount
计数重复项,还需要按子集指定列:
#changed years for match data
print (df2)
read_year read_month lmp trading_block
0 2009 1 37.5694 0
1 2009 1 34.5777 0
2 2017 1 33.7039 0
3 2017 1 33.1503 0
4 2017 1 33.8935 0
df1['g'] = df1.groupby('read_year').cumcount()
df2['g'] = df2.groupby('read_year').cumcount()
#need columns for join in subset + columns for add - here lmp column
df = df1.merge(df2[['read_year','g','lmp']],on=['read_year', 'g']).drop('g', axis=1)
print (df)
read_year read_month load trading_block lmp
0 2017 3 0.019582 0 33.7039
1 2017 3 0.019460 0 33.1503
2 2017 3 0.018888 0 33.8935