df1
ITEM CATEGORY COLOR
48684 CAR RED
54519 BIKE BLACK
14582 CAR BLACK
45685 JEEP WHITE
23661 BIKE BLUE
23226 BIKE BLUE
54252 BIKE BLACK
df2
USERID WEBBROWSE ITEM PURCHASE
1 1541 CHROME 54252 YES
2 3351 EXPLORER 54519 YES
3 2639 MOBILE APP 23661 YES
df2还有许多其他列。
我需要的输出是:
USERID WEBBROWSE ITEM PURCHASE
1 1541 CHROME 54519 YES
2 3351 EXPLORER 54519 YES
3 2639 MOBILE APP 23661 YES
从df1可以清楚地看出ITEM
54252
和54519
是相同的。因此,基于df1,我们需要替换df2中的值。
答案 0 :(得分:1)
我用新列orig
修改了先前的解决方案,以记住ITEM
的原始值,并在另一个DataFrame中通过DataFrame.set_index
和Series.replace
值创建Series:
df = df1.assign(orig=df1['ITEM'])
m = df.duplicated(['CATEGORY', 'COLOR'], keep=False)
df.loc[m, 'ITEM'] = df[m].groupby(['CATEGORY', 'COLOR'])['ITEM'].transform('first')
s = df[m].set_index('orig')['ITEM']
print (s)
orig
54519 54519
23661 23661
23226 23661
54252 54519
Name: ITEM, dtype: int64
df2['ITEM'] = df2['ITEM'].replace(s)
print (df2)
USERID WEBBROWSE ITEM PURCHASE
1 1541 CHROME 54519 YES
2 3351 EXPLORER 54519 YES
3 2639 MOBILE APP 23661 YES
没有新列的另一种替代方法是用字典替换:
orig = df1['ITEM']
m = df1.duplicated(['CATEGORY', 'COLOR'], keep=False)
df1.loc[m, 'ITEM'] = df1[m].groupby(['CATEGORY', 'COLOR'])['ITEM'].transform('first')
print (df1)
ITEM CATEGORY COLOR
0 48684 CAR RED
1 54519 BIKE BLACK
2 14582 CAR BLACK
3 45685 JEEP WHITE
4 23661 BIKE BLUE
5 23661 BIKE BLUE
6 54519 BIKE BLACK
d = dict(zip(orig[m], df1.loc[m, 'ITEM']))
print (d)
{54519: 54519, 23661: 23661}
df2['ITEM'] = df2['ITEM'].replace(d)
print (df2)
USERID WEBBROWSE ITEM PURCHASE
1 1541 CHROME 54252 YES
2 3351 EXPLORER 54519 YES
3 2639 MOBILE APP 23661 YES