我有这两个数据框
{'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.859101849501094, 1: 3.349513603073975}, 'Absolute Mean': {0: 6.917727336706257, 1: 5.352618468237218}, 'Increase': {0: 13, 1: 13}, '%change(Increase)': {0: 9.059099374005655, 1: 6.693947747162456}, 'Decrease': {0: 7, 1: 7}, '%change(Decrease)': {0: -2.940893553150234, 1: -2.861578378804634}, 'unchanged': {0: 0, 1: 0}}
第二个:
{'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.947988913441173, 1: 4.494044038470856}, 'Absolute Mean': {0: 6.972378375288884, 1: 6.366948207708872}, 'Increase': {0: 26, 1: 26}, '%change(Increase)': {0: 8.252561969120809, 1: 7.519148478124428}, 'Decrease': {0: 9, 1: 9}, '%change(Decrease)': {0: -4.04877892369542, 1: -3.745808338476033}, 'unchanged': {0: 1, 1: 1}}
我需要比较两者的绝对均值和任何具有较低绝对均值的数据帧,然后将其返回。我该怎么办?
数据框1:
数据框2:
编辑: 行数将来可能会有所不同,因此我正在寻找通用解决方案。
答案 0 :(得分:1)
您可以使用np.where
,条件是知道哪个数据帧的平均均值较小。
例如,解决方案可能是:
data1 = {'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.859101849501094, 1: 3.349513603073975}, 'Absolute Mean': {0: 6.917727336706257, 1: 5.352618468237218}, 'Increase': {0: 13, 1: 13}, '%change(Increase)': {0: 9.059099374005655, 1: 6.693947747162456}, 'Decrease': {0: 7, 1: 7}, '%change(Decrease)': {0: -2.940893553150234, 1: -2.861578378804634}, 'unchanged': {0: 0, 1: 0}}
df1 =pd.DataFrame(data1)
Category Mean Absolute Mean Increase %change(Increase) Decrease %change(Decrease) unchanged
0 BASE2_TREE_FILTER vs RETAIL 100 4.859102 6.917727 13 9.059099 7 -2.940894 0
1 LR_TREE_FILTER vs RETAIL 100 3.349514 5.352618 13 6.693948 7 -2.861578 0
data2 = {'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'}, 'Mean': {0: 4.947988913441173, 1: 4.494044038470856}, 'Absolute Mean': {0: 6.972378375288884, 1: 6.366948207708872}, 'Increase': {0: 26, 1: 26}, '%change(Increase)': {0: 8.252561969120809, 1: 7.519148478124428}, 'Decrease': {0: 9, 1: 9}, '%change(Decrease)': {0: -4.04877892369542, 1: -3.745808338476033}, 'unchanged': {0: 1, 1: 1}}
df2 =pd.DataFrame(data2)
Category Mean Absolute Mean Increase %change(Increase) Decrease %change(Decrease) unchanged
0 BASE2_TREE_FILTER vs RETAIL 100 4.947989 6.972378 26 8.252562 9 -4.048779 1
1 LR_TREE_FILTER vs RETAIL 100 4.494044 6.366948 26 7.519148 9 -3.745808 1
result = pd.DataFrame()
result['Category'] = df1['Category']
result['Data from'] = np.where(df1['Absolute Mean'] < df2['Absolute Mean'], 'df1', 'df2')
result['Min Absolute Mean'] = np.where(df1['Absolute Mean'] < df2['Absolute Mean'], df1['Absolute Mean'], df2['Absolute Mean'])
Category Data from Min Absolute Mean
0 BASE2_TREE_FILTER vs RETAIL 100 df1 6.917727
1 LR_TREE_FILTER vs RETAIL 100 df1 5.352618
答案 1 :(得分:0)
受Lorena Gil's anwser的启发:
import pandas as pd
df1 = pd.DataFrame({'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'},
'Mean': {0: 4.859101849501094, 1: 3.349513603073975},
'Absolute Mean': {0: 6.917727336706257, 1: 5.352618468237218},
'Increase': {0: 13, 1: 13},
'%change(Increase)': {0: 9.059099374005655, 1: 6.693947747162456},
'Decrease': {0: 7, 1: 7},
'%change(Decrease)': {0: -2.940893553150234, 1: -2.861578378804634},
'unchanged': {0: 0, 1: 0}})
df2 = pd.DataFrame({'Category': {0: 'BASE2_TREE_FILTER vs RETAIL 100', 1: 'LR_TREE_FILTER vs RETAIL 100'},
'Mean': {0: 4.947988913441173, 1: 4.494044038470856},
'Absolute Mean': {0: 6.972378375288884, 1: 6.366948207708872},
'Increase': {0: 26, 1: 26},
'%change(Increase)': {0: 8.252561969120809, 1: 7.519148478124428},
'Decrease': {0: 9, 1: 9},
'%change(Decrease)': {0: -4.04877892369542, 1: -3.745808338476033},
'unchanged': {0: 1, 1: 1}})
df3 = df1.where(df1['Absolute Mean'] < df2['Absolute Mean'], df2['Absolute Mean'], axis=0)