如何按索引和多列合并熊猫中的2个表

时间:2019-06-11 08:02:40

标签: python pandas

我不知道如何按索引和多列名称合并

我有索引中的日期和3个列作为合并字段

预期结果应该是:

                          A  B  C    x    y
timestamp                                  
2019-06-10T20:00:00.000Z  a  b  c  1.0  1.0
2019-06-10T21:00:00.000Z  a  b  c  1.0  NaN

这就是我得到的:

                          A  B  C    x    y
timestamp                                  
2019-06-10T20:00:00.000Z  a  b  c  NaN  1.0
2019-06-10T21:00:00.000Z  a  b  c  1.0  NaN
2019-06-10T21:00:00.000Z  a  b  c  1.0  NaN

这是我的代码:

import pandas as pd  
data_list = []
left = {}

left['timestamp'] = '2019-06-10T20:00:00.000Z'
left['A'] = 'a'
left['B'] = 'b'
left['C'] = 'c'
left['x'] = 1
data_list.append(left)

left['timestamp'] = '2019-06-10T21:00:00.000Z'
left['A'] = 'a'
left['B'] = 'b'
left['C'] = 'c'
left['x'] = 1
data_list.append(left)


df_left = pd.DataFrame(data_list)
df_left = df_left.set_index('timestamp')

print(df_left.to_string())
print()

data_list = []
right = {}
right['timestamp'] = '2019-06-10T20:00:00.000Z'
right['A'] = 'a'
right['B'] = 'b'
right['C'] = 'c'
right['y'] = 1
data_list.append(right)

df_right = pd.DataFrame(data_list)
df_right = df_right.set_index('timestamp')


merged_df = pd.merge(df_left, df_right, left_index=True, right_index=True, on=['A','B','C'], how="outer")


print(merged_df.to_string())

1 个答案:

答案 0 :(得分:-1)

您的df_left数据框是错误的。

它包含2个相同的时间戳(2019-06-10T21:00:00.000Z)。 您可以考虑像下面的代码那样创建数据框。希望对您有帮助!

import pandas as pd  
left = {
    'timestamp': ['2019-06-10T20:00:00.000Z', '2019-06-10T21:00:00.000Z'],
    'A' : ['a', 'a'],
    'B' : ['b', 'b'],
    'C' : ['c', 'c'],
    'x' : ['1.0', '1.0']
}

df_left = pd.DataFrame(left)
df_left

right = {
    'timestamp': ['2019-06-10T21:00:00.000Z'],
    'A' : ['a'],
    'B' : ['b'],
    'C' : ['c'],
    'y' : ['1.0']
}

df_right = pd.DataFrame(right)
df_right

merged_df = df_left.merge(df_right, how='outer')
merged_df

This video will definitely help you understand better!