根据关键比较两个数据帧

时间:2019-06-18 20:08:03

标签: python python-3.x pandas dataframe analytics

我有两个数据框df1和df2,它们具有完全相同的列,并且大多数时候每个键的值都相同。

Country   A   B   C   D   E   F   G   H   Key
Argentina xylo    262 4632    0   0   26.12   2   0   Argentinaxylo
Argentina phone   6860    155811  48  0   4375.87 202 0   Argentinaphone
Argentina land    507 1803728 2   117 7165.810566 3   154 Argentinaland
Australia xylo    7650    139472  69  0   16858.42    184 0   Australiaxylo
Australia mink    1284    2342788 1   0   39287.71    53  0   Australiamink


Country   A   B   C   D   E   F   G   H   Key
Argentina xylo    262 4632    0   0   26.12   2   0   Argentinaxylo
Argentina phone   6860    155811  48  0   4375.87 202 0   Argentinaphone
Argentina land    507 1803728 2   117 7165.810566 3   154 Argentinaland
Australia xylo    7650    139472  69  0   16858.42    184 0   Australiaxylo
Australia mink    1284    2342788 1   0   39287.71    53  0   Australiamink

我想要一个片段,用于相互比较每个数据框中的键(键=国家/地区列+ A列),并计算每个B-H列的百分比差异(如果有)。如果没有,则什么也不输出。

1 个答案:

答案 0 :(得分:0)

希望,下面给出的代码可以帮助您解决问题。我已经根据“键”列数据比较了这两个数据集,并分别生成了它们的(B-H)列的差值。此后,由于存在百分比差异,我只在Key列的两个数据集上进行了合并,比较了差异,并在df3数据集的df3diff列中得到了最终输出。

import pandas as pd

df1 = pd.DataFrame([['Argentina', 'xylo', 262 ,4632,    0 ,  0  , 26.12  , 2 , 0  , 'Argentinaxylo']
                    ,['Argentina', 'phone',6860,155811 , 48 , 0   ,4375.87 ,202, 0  , 'Argentinaphone']
                    ,['Argentina', 'land', 507 ,1803728, 2  , 117 ,7165.810,566, 3  , '154 Argentinaland']
                    ,['Australia', 'xylo', 7650,139472 , 69 , 0   ,16858.42,184, 0  , 'Australiaxylo']
                    ,['Australia', 'mink', 1284,2342788, 1  , 0   ,39287.71, 53,  0 , 'Australiamink']]
                   ,columns=['Country',   'A',   'B',   'C',   'D',   'E',   'F',   'G',   'H',   'Key'])

df1['df1BH'] = (df1['B']-df1['H'])/100.00
print(df1)


df2 = pd.DataFrame([['Argentina', 'xylo', 262 ,4632   , 0 ,  0   ,26.12   ,2  , 0   ,'Argentinaxylo']
                    ,['Argentina', 'phone',6860,155811 , 48,  0   ,4375.87 ,202, 0   ,'Argentinaphone']
                    ,['Argentina', 'land', 507 ,1803728, 2 ,  117 ,7165.810,566, 3   ,'154 Argentinaland']
                    ,['Australia', 'xylo', 97650,139472 , 69,  0   ,96858.42,184, 0   ,'Australiaxylo']
                    ,['Australia', 'mink', 1284,2342788, 1 ,  0   ,39287.71, 53,  0  ,'Australiamink']]
                   ,columns=['Country',   'A',   'B',   'C',   'D',   'E',   'F',   'G',   'H',   'Key'])
df2['df2BH'] = (df2['B']-df2['H'])/100.00
print(df2)



df3 = pd.merge(df1[['Key','df1BH']],df2[['Key','df2BH']], on=['Key'],how='outer')
df3['df3diff'] = df3['df1BH'] - df3['df2BH']
print(df3)

输出:

                 Key  df1BH   df2BH  df3diff
0      Argentinaxylo   2.62    2.62      0.0
1     Argentinaphone  68.60   68.60      0.0
2  154 Argentinaland   5.04    5.04      0.0
3      Australiaxylo  76.50  976.50   -900.0
4      Australiamink  12.84   12.84      0.0