合并熊猫数据框时出现重复的列

时间:2020-09-10 02:32:46

标签: python pandas dataframe

我想合并df1和df2。我合并df1和df2时遇到的当前问题是,它会产生重复的“ Fluc”列。数据帧必须合并on='Horse'

数据框代码:

cols1 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2','Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data, columns=cols1)
cols2 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
dfmerge = pd.merge(df1,df4,on='Horse',how='inner')

df1的输出:

              Race           Horse  Fluc 1  Fluc 2      Bookmaker   Odds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy   4.20
1       Ipswich R1  Battle Through     4.2    4.22           Neds   4.20
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet   4.20
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting   4.45
4       Ipswich R1  Battle Through     4.2    4.22         Bet365   4.20

df4的输出:

              Race           Horse  Fluc 1  Fluc 2      Bookmaker  AvgOdds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy     4.20
1       Ipswich R1  Battle Through     4.2    4.22           Neds     4.20
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet     4.20
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting     4.45
4       Ipswich R1  Battle Through     4.2    4.22         Bet365     4.20

dfmerge的输出:

              Race           Horse  Fluc 1_x  Fluc 2_x      Bookmaker  Odds  Fluc 1_y  Fluc 2_y  AvgOdds
0       Ipswich R1  Battle Through      8.34      8.38           Neds   8.5      8.34      8.38     8.65
1       Ipswich R1  Battle Through      8.34      8.38      Sportsbet   8.0      8.34      8.38     8.65
2       Ipswich R1  Battle Through      8.34      8.38  SportsBetting   9.1      8.34      8.38     8.65
3       Ipswich R1  Battle Through      8.34      8.38         Bet365   9.0      8.34      8.38     8.65
4       Ipswich R1      Simply Fly      1.89      1.87           Neds   1.8      1.89      1.87     1.84

所需的dfmerge输出:

              Race           Horse  Fluc 1  Fluc 2      Bookmaker   Odds    AvgOdds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy   4.20    4.2
1       Ipswich R1  Battle Through     4.2    4.22           Neds   4.20    4.2
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet   4.20    4.2
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting   4.45    4.2
4       Ipswich R1  Battle Through     4.2    4.22         Bet365   4.20    4.2

1 个答案:

答案 0 :(得分:0)

尝试一下

dfmerge = pd.merge(df1, df4, on=['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker'], how='inner')
print(dfmerge)

输出:

         Race           Horse  Fluc 1  Fluc 2      Bookmaker  Odds  AvgOdds
0  Ipswich R1  Battle Through     4.2    4.22        BetEasy  4.20     4.20
1  Ipswich R1  Battle Through     4.2    4.22           Neds  4.20     4.20
2  Ipswich R1  Battle Through     4.2    4.22      Sportsbet  4.20     4.20
3  Ipswich R1  Battle Through     4.2    4.22  SportsBetting  4.45     4.45
4  Ipswich R1  Battle Through     4.2    4.22         Bet365  4.20     4.20