Question

我正在汇总我的df：

dfAvg_Volume_RFQ = dfSpecific_Client_Avg_Volume_RFQ.groupby(['Client', 'currency', 'sales_person_name2']).agg({'state': 'size', 'Quantity': 'mean', 'Quantity_CAD': 'mean'})

print(dfAvg_Volume_RFQ.info())

MultiIndex: 1127 entries, (A......) to (Z.....)
    Data columns (total 3 columns):
    state           1127 non-null int64
    Quantity        1127 non-null float64
    Quantity_CAD    1127 non-null float64

当我查看df的头部时，它会显示所有六列。当我在dfSpecific_Client_Avg_Volume_RFQ上运行其他聚合时，它仅使用info中的三列，这是一个问题。

print(dfAvg_Volume_RFQ.head(5))

                                       state      Quantity  \
Client  currency sales_person_name2                           
A       USD      OSCAR                  2         2050000.0000   
AA      USD      NAZ                    10        11500000.0000   
AAR     USD      JOSHUA                 1         15000.0000   
ABC     USD      ANGELA                 1         5000000.0000   
                 HANS                   1         10000000.0000   

                                                  Quantity_CAD  
Client  currency sales_person_name2                           
A       USD      OSCAR                  2         2050000.0000   
AA      USD      NAZ                    10        11500000.0000   
AAR     USD      JOSHUA                 1         15000.0000   
ABC     USD      ANGELA                 1         5000000.0000   
                 HANS                   1         10000000.0000   

print(dfAvg_Volume_RFQ.columns)

   state       Quantity   Quantity_CAD
0      1  50000000.0000  47523999.6198
1      4 300000000.0000 399625821.9816
2     18 274241666.6667 365848851.3870
3      1 300000000.0000 409165302.7823
4     32 138905156.2500 138905156.2500

print (dfAvg_Volume_RFQ.index.names)

Index(['state', 'Quantity', 'Quantity_CAD'], dtype='object')
['Client', 'currency', 'sales_person_name2']

在众多列上进行分组和求和时，如果没有多索引就不可能产生df吗？

Answer 1

我认为这是可以预期的，因为这里没有6列，而是3级MultiIndex和3列。

通过以下方式对其进行测试：

print (dfAvg_Volume_RFQ.columns)

print (dfAvg_Volume_RFQ.index.names)

如果需要将MultiIndex转换为列，请使用as_index=False中的DataFrame.reset_index或参数groupby：

dfAvg_Volume_RFQ = dfAvg_Volume_RFQ.reset_index()

或者：

dfAvg_Volume_RFQ = dfSpecific_Client_Avg_Volume_RFQ.groupby(['Client', 'currency', 'sales_person_name2'], as_index=False)
                                                   .agg({'state': 'size', 'Quantity': 'mean', 'Quantity_CAD': 'mean'}

Answer 2

在groupby上使用聚合时，groupby中使用的列构成结果DataFrame的索引。因此，在您的示例中，“客户”，“货币”，“ sales_person_name2”构成了索引，并且您只有3个 true 列：state，Quantity和{{1} }。

如果您想摆脱索引并拥有6列，只需使用Quantity_CAD：

reset_index

您将获得一个带有简单RangeIndex和6个数据列的DataFrame

由于MultiIndexing，Pandas汇总的df显示head（）和.info（）之间的列数不同

2 个答案: