Question

我有一个熊猫数据框，如下所示：

您将在此处注意到，有许多行具有相同的code_module,code_presentation,id_student组合我想做的是合并所有这些重复的行，然后将sum_clicks与每个组加起来

例如，对于最上面的几行，它们将合并为以下一行：

         code_module code_presentation  id_student  sum_click
0                AAA             2013J       28400          18

在SQL术语中，私钥应为code_module,code_presentation,id_student组合

在此过程中，我尝试通过以下方式使用groupby：

groupby(['id_student','code_presentation','code_module']).aggregate({'sum_click': 'sum',})

但是这没有用，因为它提供了甚至在我的数据集中也没有的学生ID，我不知道为什么

此外，groupby似乎也不是我想要的，因为它的数据结构与标准熊猫数据框不同，这正是我想要的。

问题可以在以下输出中看到

                                        sum_click
id_student code_presentation code_module           
6516       2014J             AAA               2791
8462       2013J             DDD                646
          2014J             DDD                 10
11391      2013J             AAA                934

第1行和第2行（从0开始索引）应该是不同的行，而不是原来的行

Answer 1

尝试一下-

df.groupby(['code_module', 'code_presentation', 'id_student']).agg(sum_clicks=('sum_click', 'sum')).reset_index()

合并具有相同值的熊猫行

1 个答案: