我有两个数据框df1和df2,它们具有完全相同的列,并且大多数时候每个键的值都相同。
Country A B C D E F G H Key Argentina xylo 262 4632 0 0 26.12 2 0 Argentinaxylo Argentina phone 6860 155811 48 0 4375.87 202 0 Argentinaphone Argentina land 507 1803728 2 117 7165.810566 3 154 Argentinaland Australia xylo 7650 139472 69 0 16858.42 184 0 Australiaxylo Australia mink 1284 2342788 1 0 39287.71 53 0 Australiamink Country A B C D E F G H Key Argentina xylo 262 4632 0 0 26.12 2 0 Argentinaxylo Argentina phone 6860 155811 48 0 4375.87 202 0 Argentinaphone Argentina land 507 1803728 2 117 7165.810566 3 154 Argentinaland Australia xylo 7650 139472 69 0 16858.42 184 0 Australiaxylo Australia mink 1284 2342788 1 0 39287.71 53 0 Australiamink
我想要一个片段,用于相互比较每个数据框中的键(键=国家/地区列+ A列),并计算每个B-H列的百分比差异(如果有)。如果没有,则什么也不输出。
答案 0 :(得分:0)
希望,下面给出的代码可以帮助您解决问题。我已经根据“键”列数据比较了这两个数据集,并分别生成了它们的(B-H)列的差值。此后,由于存在百分比差异,我只在Key列的两个数据集上进行了合并,比较了差异,并在df3数据集的df3diff列中得到了最终输出。
import pandas as pd
df1 = pd.DataFrame([['Argentina', 'xylo', 262 ,4632, 0 , 0 , 26.12 , 2 , 0 , 'Argentinaxylo']
,['Argentina', 'phone',6860,155811 , 48 , 0 ,4375.87 ,202, 0 , 'Argentinaphone']
,['Argentina', 'land', 507 ,1803728, 2 , 117 ,7165.810,566, 3 , '154 Argentinaland']
,['Australia', 'xylo', 7650,139472 , 69 , 0 ,16858.42,184, 0 , 'Australiaxylo']
,['Australia', 'mink', 1284,2342788, 1 , 0 ,39287.71, 53, 0 , 'Australiamink']]
,columns=['Country', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'Key'])
df1['df1BH'] = (df1['B']-df1['H'])/100.00
print(df1)
df2 = pd.DataFrame([['Argentina', 'xylo', 262 ,4632 , 0 , 0 ,26.12 ,2 , 0 ,'Argentinaxylo']
,['Argentina', 'phone',6860,155811 , 48, 0 ,4375.87 ,202, 0 ,'Argentinaphone']
,['Argentina', 'land', 507 ,1803728, 2 , 117 ,7165.810,566, 3 ,'154 Argentinaland']
,['Australia', 'xylo', 97650,139472 , 69, 0 ,96858.42,184, 0 ,'Australiaxylo']
,['Australia', 'mink', 1284,2342788, 1 , 0 ,39287.71, 53, 0 ,'Australiamink']]
,columns=['Country', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'Key'])
df2['df2BH'] = (df2['B']-df2['H'])/100.00
print(df2)
df3 = pd.merge(df1[['Key','df1BH']],df2[['Key','df2BH']], on=['Key'],how='outer')
df3['df3diff'] = df3['df1BH'] - df3['df2BH']
print(df3)
输出:
Key df1BH df2BH df3diff
0 Argentinaxylo 2.62 2.62 0.0
1 Argentinaphone 68.60 68.60 0.0
2 154 Argentinaland 5.04 5.04 0.0
3 Australiaxylo 76.50 976.50 -900.0
4 Australiamink 12.84 12.84 0.0