Question

我必须将其按所有非数字列分组（数字列将是float和int），并打印按平均值汇总的结果数据框。在groupby操作之后，输出应该是结果数据帧的前五行。

输入：csv文件

输出：

                                                                        Sentiment_Polarity  \
App                    Translated_Review                      Sentiment                       
10 Best Foods for You 10 best foods 4u Excellent chose foods Positive                 1.00   
                      A big thanks ds I got bst gd health    Positive                 0.10   
                      Absolutely Fabulous Phenomenal         Positive                 0.45   
                      Amazing                                Positive                 0.60   
                      An excellent A useful                  Positive                 0.65

                                                                     Sentiment_Subjectivity  
App                   Translated_Review                      Sentiment                          
10 Best Foods for You 10 best foods 4u Excellent chose foods Positive                     0.65  
                     A big thanks ds I got bst gd health    Positive                     0.15  
                     Absolutely Fabulous Phenomenal         Positive                     0.75  
                     Amazing                                Positive                     0.90  
                     An excellent A useful                  Positive                     0.50

Answer 1

您可以通过使用pandas.DataFrame.select_dtypes来做到这一点，排除所有数字列，从而获得string或object类型的列：

groupcols = df.select_dtypes(exclude="number").columns.tolist()
group_df = df.groupby(groupcols).mean() #.reset_index()

如果要执行这些步骤，可以重置索引。

您还可以使用以下内容仅获取分类列：

groupcols = df.select_dtypes(include="category").columns.tolist()

请阅读文档，了解如何包含/排除所需的dtypes。

编辑：

如果原始数据框是MultiIndex数据框，则需要作为第一步：

# MultiIndex to columns
df = df.reset_index()

对所有非数字列进行分组，并打印按平均值汇总的结果数据框

1 个答案:

编辑：