合并具有某些匹配列的数据框名称会导致重复列

时间:2020-02-19 22:00:07

标签: python pandas join merge

我有两个带有重叠列的数据框,我正尝试为给定的SymbolDate合并这些数据框。但是,当我这样做时,而不是填充丢失的数据,而是添加了带后缀的新列。

df1

  Investor     Date   Name Symbol  Price  Amount  Income
0     Mike  2019 Q4  A Inc    AAA    NaN     100     NaN
1     Bill  2019 Q4  C Inc    CCC    NaN     200     NaN
2     John  2018 Q4  A Inc    AAA    NaN     200     NaN
3     Faye  2018 Q4  D Inc    DDD    NaN     300     NaN
4      Joe  2019 Q2  A Inc    AAA    NaN     300     NaN
5     Hank  2019 Q2  S Inc    SSS    NaN     100     NaN

df2

      Date   Name Symbol  Price  Income
0  2019 Q4  A Inc    AAA      5      10
1  2019 Q4  B Inc    BBB      3      20
2  2019 Q4  C Inc    CCC     33      30
3  2019 Q4  D Inc    DDD     30      40
4  2018 Q4  A Inc    AAA     23      20
5  2018 Q4  B Inc    BBB      4      30
6  2018 Q4  C Inc    CCC    136      40
7  2018 Q4  D Inc    DDD      6      50
8  2018 Q4  E Inc    EEE      1      90

我希望我的输出看起来像:

  Investor     Date   Name Symbol  Price  Amount  Income
0     Mike  2019 Q4  A Inc    AAA    5.0     100    10.0
1     Bill  2019 Q4  C Inc    CCC   33.0     200    30.0
2     John  2018 Q4  A Inc    AAA   23.0     200    20.0
3     Faye  2018 Q4  D Inc    DDD    6.0     300    50.0
4      Joe  2019 Q2  A Inc    AAA    NaN     300     NaN
5     Hank  2019 Q2  S Inc    SSS    NaN     100     NaN

但是当我做df3 = pd.merge(df1, df2, on=['Date', 'Symbol'], how='left')时,我得到:

  Investor     Date Name_x Symbol  ...  Income_x  Name_y  Price_y Income_y
0     Mike  2019 Q4  A Inc    AAA  ...       NaN   A Inc      5.0     10.0
1     Bill  2019 Q4  C Inc    CCC  ...       NaN   C Inc     33.0     30.0
2     John  2018 Q4  A Inc    AAA  ...       NaN   A Inc     23.0     20.0
3     Faye  2018 Q4  D Inc    DDD  ...       NaN   D Inc      6.0     50.0
4      Joe  2019 Q2  A Inc    AAA  ...       NaN     NaN      NaN      NaN
5     Hank  2019 Q2  S Inc    SSS  ...       NaN     NaN      NaN      NaN

我在做什么错?

df1 = `df1 = {'Investor': {0: 'Mike', 1: 'Bill', 2: 'John', 3: 'Faye', 4: 'Joe', 5: 'Hank'}, 'Date': {0: '2019 Q4', 1: '2019 Q4', 2: '2018 Q4', 3: '2018 Q4', 4: '2019 Q2', 5: '2019 Q2'}, 'Name': {0: 'A Inc', 1: 'C Inc', 2: 'A Inc', 3: 'D Inc', 4: 'A Inc', 5: 'S Inc'}, 'Symbol': {0: 'AAA', 1: 'CCC', 2: 'AAA', 3: 'DDD', 4: 'AAA', 5: 'SSS'}, 'Price': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'Amount': {0: 100, 1: 200, 2: 200, 3: 300, 4: 300, 5: 100}, 'Income': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}}`
df2 = {'Date': {0: '2019 Q4', 1: '2019 Q4', 2: '2019 Q4', 3: '2019 Q4', 4: '2018 Q4', 5: '2018 Q4', 6: '2018 Q4', 7: '2018 Q4', 8: '2018 Q4'}, 'Name': {0: 'A Inc', 1: 'B Inc', 2: 'C Inc', 3: 'D Inc', 4: 'A Inc', 5: 'B Inc', 6: 'C Inc', 7: 'D Inc', 8: 'E Inc'}, 'Symbol': {0: 'AAA', 1: 'BBB', 2: 'CCC', 3: 'DDD', 4: 'AAA', 5: 'BBB', 6: 'CCC', 7: 'DDD', 8: 'EEE'}, 'Price': {0: 5, 1: 3, 2: 33, 3: 30, 4: 23, 5: 4, 6: 136, 7: 6, 8: 1}, 'Income': {0: 10, 1: 20, 2: 30, 3: 40, 4: 20, 5: 30, 6: 40, 7: 50, 8: 90}}
df3 = pd.merge(df1, df2, on=['Date', 'Symbol'], how='left')

1 个答案:

答案 0 :(得分:2)

那是因为两个数据帧上都有Name, Income, Price。如果您不想重复,则应选择所需的列:

(df1[['Investor', 'Name', 'Date','Symbol','Amount']]
   .merge(df2.drop('Name', axis=1),
          on=['Date','Symbol'],
          how='left')
)

输出:

  Investor   Name     Date Symbol  Amount  Price  Income
0     Mike  A Inc  2019 Q4    AAA     100    5.0    10.0
1     Bill  C Inc  2019 Q4    CCC     200   33.0    30.0
2     John  A Inc  2018 Q4    AAA     200   23.0    20.0
3     Faye  D Inc  2018 Q4    DDD     300    6.0    50.0
4      Joe  A Inc  2019 Q2    AAA     300    NaN     NaN
5     Hank  S Inc  2019 Q2    SSS     100    NaN     NaN