Question

作为Python中数据比较的一部分，我有一个数据帧的输出。如您所见，我会比较PROD_和PROJ_数据。

示例：

print (df)
          PROD_Label         PROJ_Label  Diff_Label  PROD_OAD  PROJ_OAD  \
0             Energy             Energy        True      1.94      1.94   
1  Food and Beverage  Food and Beverage        True      1.97      1.97   
2         Healthcare         Healthcare        True      8.23      8.23   
3  Consumer Products  Consumer Products        True      3.67       NaN   
4          Retailers          Retailers        True      5.88       NaN   

   Diff_OAD  PROD_OAD_Tin  PROJ_OAD_Tin  Diff_OAD_Tin  
0      True          0.02          0.02          True  
1      True          0.54          0.01         False  
2      True          0.05          0.05          True  
3     False          0.02          0.02          True  
4     False          0.06          0.06          True

像PROD_Label，PROJ_Label这样的字符串列是＆＃34;非空对象＆＃34;。这里的比较结果为真/假和预期。

对于PROD_OAD，PROJ_OAD，PROD_OAD_Tin等数字列，PROJ_OAD_Tin是＆＃34;非null float64＆＃34;。目前我的输出显示比较为真和假（如上所述）。但我希望这与实际差异一样，如下图所示，但仅适用于数字列。

是否可以指定特定列名称，并将结果的差异转储到Diff_列。

请注意，我不想比较所有PROD_和PROJ_列。字符串的差异在true / false中已经正确。只是寻找一些数字格式的特定列。

Answer 1

我认为如果仅存在具有相同结构的数字列，则只能提取数字列并获取for中sub使用的唯一值：

a = df.select_dtypes([np.number]).columns.str.split('_', n=1).str[1].unique()
print (a)
Index(['OAD', 'OAD_Tin'], dtype='object')

for x in a:
    df['Diff_' + x] = df['PROJ_' + x].sub(df['PROD_' + x], fill_value=0)
print (df)
          PROD_Label         PROJ_Label  Diff_Label  PROD_OAD  PROJ_OAD  \
0             Energy             Energy        True      1.94      1.94   
1  Food and Beverage  Food and Beverage        True      1.97      1.97   
2         Healthcare         Healthcare        True      8.23      8.23   
3  Consumer Products  Consumer Products        True      3.67       NaN   
4          Retailers          Retailers        True      5.88       NaN   

   Diff_OAD  PROD_OAD_Tin  PROJ_OAD_Tin  Diff_OAD_Tin  
0      0.00          0.02          0.02          0.00  
1      0.00          0.54          0.01         -0.53  
2      0.00          0.05          0.05          0.00  
3     -3.67          0.02          0.02          0.00  
4     -5.88          0.06          0.06          0.00

比较数据框中的特定列是否存在差异

1 个答案: