如何比较不同数据框中的列

时间:2020-10-21 16:55:39

标签: python pandas

我有这两个数据框

{'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.859101849501094, 1: 3.349513603073975}, 'Absolute Mean': {0: 6.917727336706257, 1: 5.352618468237218}, 'Increase': {0: 13, 1: 13}, '%change(Increase)': {0: 9.059099374005655, 1: 6.693947747162456}, 'Decrease': {0: 7, 1: 7}, '%change(Decrease)': {0: -2.940893553150234, 1: -2.861578378804634}, 'unchanged': {0: 0, 1: 0}}

第二个:

{'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.947988913441173, 1: 4.494044038470856}, 'Absolute Mean': {0: 6.972378375288884, 1: 6.366948207708872}, 'Increase': {0: 26, 1: 26}, '%change(Increase)': {0: 8.252561969120809, 1: 7.519148478124428}, 'Decrease': {0: 9, 1: 9}, '%change(Decrease)': {0: -4.04877892369542, 1: -3.745808338476033}, 'unchanged': {0: 1, 1: 1}}

我需要比较两者的绝对均值和任何具有较低绝对均值的数据帧,然后将其返回。我该怎么办?

数据框1:

enter image description here

数据框2:

enter image description here

编辑: 行数将来可能会有所不同,因此我正在寻找通用解决方案。

2 个答案:

答案 0 :(得分:1)

您可以使用np.where,条件是知道哪个数据帧的平均均值较小。

例如,解决方案可能是:

  1. 创建两个DataFrame
data1 = {'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.859101849501094, 1: 3.349513603073975}, 'Absolute Mean': {0: 6.917727336706257, 1: 5.352618468237218}, 'Increase': {0: 13, 1: 13}, '%change(Increase)': {0: 9.059099374005655, 1: 6.693947747162456}, 'Decrease': {0: 7, 1: 7}, '%change(Decrease)': {0: -2.940893553150234, 1: -2.861578378804634}, 'unchanged': {0: 0, 1: 0}}

df1 =pd.DataFrame(data1) 

                          Category      Mean  Absolute Mean  Increase  %change(Increase)  Decrease  %change(Decrease)  unchanged
0  BASE2_TREE_FILTER vs RETAIL 100  4.859102       6.917727        13           9.059099         7          -2.940894          0
1     LR_TREE_FILTER vs RETAIL 100  3.349514       5.352618        13           6.693948         7          -2.861578          0
data2 = {'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.947988913441173, 1: 4.494044038470856}, 'Absolute Mean': {0: 6.972378375288884, 1: 6.366948207708872}, 'Increase': {0: 26, 1: 26}, '%change(Increase)': {0: 8.252561969120809, 1: 7.519148478124428}, 'Decrease': {0: 9, 1: 9}, '%change(Decrease)': {0: -4.04877892369542, 1: -3.745808338476033}, 'unchanged': {0: 1, 1: 1}}

df2 =pd.DataFrame(data2) 
                          Category      Mean  Absolute Mean  Increase  %change(Increase)  Decrease  %change(Decrease)  unchanged
0  BASE2_TREE_FILTER vs RETAIL 100  4.947989       6.972378        26           8.252562         9          -4.048779          1
1     LR_TREE_FILTER vs RETAIL 100  4.494044       6.366948        26           7.519148         9          -3.745808          1
  1. 我用结果创建了另一个DataFrame:
result = pd.DataFrame()
result['Category'] = df1['Category']
  1. 我使用np.where来了解哪个DataFrame具有较低的平均均值:
result['Data from'] = np.where(df1['Absolute Mean'] < df2['Absolute Mean'], 'df1', 'df2')
result['Min Absolute Mean'] = np.where(df1['Absolute Mean'] < df2['Absolute Mean'], df1['Absolute Mean'], df2['Absolute Mean'])
  1. 输出
                           Category   Data from    Min Absolute Mean
0   BASE2_TREE_FILTER vs RETAIL 100   df1          6.917727
1   LR_TREE_FILTER vs RETAIL 100      df1          5.352618

答案 1 :(得分:0)

Lorena Gil's anwser的启发:

    import pandas as pd
            
    df1 = pd.DataFrame({'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 
                        'Mean': {0: 4.859101849501094, 1: 3.349513603073975}, 
                        'Absolute Mean': {0: 6.917727336706257, 1: 5.352618468237218}, 
                        'Increase': {0: 13, 1: 13}, 
                        '%change(Increase)': {0: 9.059099374005655, 1: 6.693947747162456}, 
                        'Decrease': {0: 7, 1: 7}, 
                        '%change(Decrease)': {0: -2.940893553150234, 1: -2.861578378804634}, 
                        'unchanged': {0: 0, 1: 0}})
            
            
    df2 = pd.DataFrame({'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 
                        'Mean': {0: 4.947988913441173, 1: 4.494044038470856}, 
                        'Absolute Mean': {0: 6.972378375288884, 1: 6.366948207708872}, 
                         'Increase': {0: 26, 1: 26}, 
                         '%change(Increase)': {0: 8.252561969120809, 1: 7.519148478124428}, 
                         'Decrease': {0: 9, 1: 9}, 
                         '%change(Decrease)': {0: -4.04877892369542, 1: -3.745808338476033}, 
                         'unchanged': {0: 1, 1: 1}})
            
            
    df3 = df1.where(df1['Absolute Mean'] < df2['Absolute Mean'], df2['Absolute Mean'], axis=0)