Question

我使用以下代码创建了一个数据透视表：

confMatrix_structure = pd.DataFrame({
        'confMatrix_column':['selected','notSelected'],
        'confMatrix_index':['relevant','irrelevant'],
        'confMatrix_dummy_value': [5,10]
    })

confusion_matrix = 
         pd.pivot_table(confMatrix_structure,values='confMatrix_dummy_value',
                                      index=['confMatrix_index'],
             columns=['confMatrix_column'],margins=True,margins_name='total')

pivot_table如下所示：

现在，如果我通过编写以下代码来更新表：

confusion_matrix.loc['irrelevant','selected']=25

运行上述代码后，该表如下所示：

从图片中可以看到，单元格已由边距更新/总计未更新以反映此更改。

我编写了以下函数，并在每次单元格更新后使用它来更新总值：

def updateConfMatrix(): 
   confusion_matrix.loc['irrelevant','total'] =confusion_matrix.loc['irrelevant'].sum() -confusion_matrix.loc['irrelevant','total']
   confusion_matrix.loc['relevant','total'] = confusion_matrix.loc['relevant'].sum() - confusion_matrix.loc['relevant','total']
   confusion_matrix.loc['total','notSelected'] = confusion_matrix.loc[:,'notSelected'].sum() - confusion_matrix.loc['total','notSelected']
   confusion_matrix.loc['total','selected'] = confusion_matrix.loc[:,'selected'].sum() - confusion_matrix.loc['total','selected']
   confusion_matrix.loc['total','total'] = confusion_matrix.loc['total'].sum() - confusion_matrix.loc['total','total']

这对于小数据来说可以正常工作，但是我有成千上万的数据。那么，有没有更简单，更快捷的方法来进行此更新？或自动执行此过程，以便在更新任何单元格后，pivot_table重新计算其总计/边距。

Answer 1

这是一个通用函数，适用于任意数量的行和列。

def updateConfMatrix():
    cols = list(confusion_matrix)
    cols.remove('total') # exclude the total column
    for col in list(confusion_matrix)[-1]: 
        confusion_matrix.loc['total', col] = confusion_matrix[col].sum() - confusion_matrix.loc['total', col]
    confusion_matrix.total = confusion_matrix[cols].sum(axis=1)

熊猫：自动更新数据透视表边距

1 个答案: