我有两个pandas数据帧: DF1:
LT route_1 c2
PM/2 120 44
PM/52 110 49
PM/522 103 51
PM/522 103 51
PM/24 105 48
PM/536 109 67
PM/536 109 67
PM/5356 112 144
DF2:
LT W_ID
PM/2 120.0
PM/52 110.0
PM/522 103.0
PM/522 103.0
PM/24 105.0
PM/536 109.0
PM/536 109.0
PM/5356 112.0
我需要将df2中的W_ID从df1映射到route_1,以清除,替换,但是来自一个表的LT需要匹配来自另一个表的LT。 期望的输出:
LT route_1 c2
PM/2 120.0 44
PM/52 110.0 49
PM/522 103.0 51
PM/522 103.0 51
PM/24 105.0 48
PM/536 109.0 67
PM/536 109.0 67
PM/5356 112.0 144
答案 0 :(得分:1)
我认为map
应该有效:
df1['route_1'] = df1['LT'].map(df2.set_index('LT')['W_ID'])
不幸的是没有:
InvalidIndexError:重新索引仅对具有唯一值的索引对象有效
编辑:
问题在于duplicates
列中的LT
。解决方案是cumcount
为merge
添加唯一left join
的辅助列:
df1['g'] = df1.groupby('LT').cumcount()
df2['g'] = df2.groupby('LT').cumcount()
df = pd.merge(df1, df2, on=['LT','g'], how='left')
print (df)
LT route_1 c2 g W_ID
0 PM/2 120 44 0 120.0
1 PM/52 110 49 0 110.0
2 PM/522 103 51 0 103.0
3 PM/522 103 51 1 103.0
4 PM/24 105 48 0 105.0
5 PM/536 109 67 0 109.0
6 PM/536 109 67 1 109.0
7 PM/5356 112 144 0 112.0
df1['route_1'] = df['W_ID']
df1.drop('g', axis=1, inplace=True)
print (df1)
LT route_1 c2
0 PM/2 120.0 44
1 PM/52 110.0 49
2 PM/522 103.0 51
3 PM/522 103.0 51
4 PM/24 105.0 48
5 PM/536 109.0 67
6 PM/536 109.0 67
7 PM/5356 112.0 144
类似的解决方案:
df1['g'] = df1.groupby('LT').cumcount()
df2['g'] = df2.groupby('LT').cumcount()
df = pd.merge(df1, df2, on=['LT','g'], how='left')
.drop(['g', 'route_1'], axis=1)
.rename(columns={'W_ID':'route_1'})
.reindex_axis(['LT', 'route_1', 'c2'], axis=1)
print (df)
LT route_1 c2
0 PM/2 120.0 44
1 PM/52 110.0 49
2 PM/522 103.0 51
3 PM/522 103.0 51
4 PM/24 105.0 48
5 PM/536 109.0 67
6 PM/536 109.0 67
7 PM/5356 112.0 144