我正在尝试合并2个具有相同信息但分解方式不同的数据框
df1:#net团队级别的总积分
Team Current Sales Previous Sales Team Total Diff
Blue 10 5 5
Orange 20 8 12
Yellow 40 11 29
df2:#net总数按地区细分
Team Region Curr Sales Prev Sales Net Diff
Blue East 4 4 0
Blue West 6 1 5
Orange East 6 3 3
Orange West 14 5 9
Yellow East 15 3 12
Yellow West 25 8 17
合并数据框:
Team Region Curr Sales Previ Sales Net Diff Team Total Diff
Blue East 4 4 0 5
Blue West 6 1 5 5
Orange East 6 3 3 12
Orange West 14 5 9 12
Yellow East 15 3 12 29
Yellow West 25 8 17 29
我正在这样做,所以我可以在新列中执行其他统计功能,但是我不确定如何将两者合并。如果我将df1 ['Team Total Diff']添加到df2,则它会填充前3条记录,并且不会填写每个团队的名称。
如果我使用以下合并功能,则看不到任何变化:
df2.merge(df1[['team_sort', 'Team']], how='inner', on='Team')
'team_sort'用作索引,以保持基于Net Team Diff升序排列的团队
任何帮助将不胜感激
答案 0 :(得分:2)
您可以在此情景中使用map
:
df2['Team Total Diff'] = df2['Team'].map(df1.set_index('Team')['Team Total Diff'])
df2
输出:
Team Region Curr Sales Prev Sales Net Diff Team Total Diff
0 Blue East 4 4 0 5
1 Blue West 6 1 5 5
2 Orange East 6 3 3 12
3 Orange West 14 5 9 12
4 Yellow East 15 3 12 29
5 Yellow West 25 8 17 29
答案 1 :(得分:1)
merge
是正确的方法,但是您使用的方法不正确。试试看:
merged_df = df2.merge(df1[['Team', 'Team Total Diff']], on=['Team'])
这是因为merge
与DataFrame
的大多数方法一样,实际上产生了一个新的DataFrame
对象,而不是更改self
。
在处理索引方面可能会有些棘手,因此我通常只在合并数据帧之前重置索引。
答案 2 :(得分:-1)
我认为应该这样做:
merged_df = pd.merge(df1, df2, how=right, left_on="Team", right_on="Team")
答案 3 :(得分:-1)
merged_df = pd.concat([df1,df2], join='inner')
join
的默认值是外部的,因此请尝试inner
。如果这样做不起作用,请outer
merged_df = pd.concat([df1,df2], join='outer')