Question

我找不到一个很好的解决方案来合并这两个数据集：

假设我有第一个数据集，其中包含城市温度

       2016 2017
cityA   23  27
cityB   24  28

另外一个信息很多，但是看起来像这样：

    city    year    other
0   cityA   2016    aa
1   cityB   2017    bb
2   cityA   2016    cc
3   cityB   2017    dd

我想要以下结果：

     city  year other  temperatures
0   cityA   2016    aa    23
1   cityB   2017    bb    28
2   cityA   2016    cc    23
3   cityB   2017    dd    24

感谢您的帮助！

编辑：真实且更复杂的数据框：

具有温度的数据框1

数据框2和其他数据：

答案的执行结果：

Answer 1

使用stack和reset_index进行重塑，然后使用merge进行重塑，我认为是左连接：

df11 = df1.stack().reset_index()
df11.columns = ['city','year','temperatures']
#if years are strings convert to integers
df11['year'] = df11['year'].astype(int)

df = df2.merge(df11, on=['city','year'], how='left')
print (df)
    city  year other  temperatures
0  cityA  2016    aa            23
1  cityB  2017    bb            28
2  cityA  2016    cc            23
3  cityB  2017    dd            28

Answer 2

融化+合并

您可以融化“枢轴”数据框，然后与主数据框合并。假设第一个数据框中的年列是整数。

melted = pd.melt(df1.reset_index(), id_vars='index')

res = df2.merge(melted, left_on=['city', 'year'],
                right_on=['index', 'variable'], how='left')

print(res[['city', 'year', 'other', 'value']])

    city  year other  value
0  cityA  2016    aa     23
1  cityB  2017    bb     28
2  cityA  2016    cc     23
3  cityB  2017    dd     28

使用类似的数据合并2个熊猫数据框

2 个答案:

融化+合并