Question

我有一个（简化的）数据框，如：

+--------+-----------+-------+
| type   | estimated | value |
+--------+-----------+-------+
| type_a | TRUE      | 1     |
| type_a | TRUE      | 2     |
| type_a |           | 3     |
| type_b |           | 4     |
| type_b |           | 5     |
| type_b |           | 6     |
+--------+-----------+-------+

我想把它分组并分成两行：

+--------+-----------+-------+
|  type  | estimated | value |
+--------+-----------+-------+
| type_a | TRUE      |     6 |
| type_b |           |    15 |
+--------+-----------+-------+

但是，如果估算了任何分组形成它的行，我希望分组行的“估计”列为TRUE。如果我的分组包含“估计”列，则不会将这些行组合在一起。

我的想法是遍历每个组，例如（伪代码）

grouped = df.groupby('type')
for group in grouped:
    group['flag'] = 0
    for row in group:
        if row['estimated'] == True:
            group['flag'] = 1

然后在分组之后，我可以将所有具有非零'flag'的行设置为估计= True。

我在弄清楚如何遍历各组行时遇到了一些麻烦，而且解决方案看起来很糟糕。此外，你不应该编辑你正在迭代的东西。有解决方案/更好的方法吗？

Answer 1

您希望groupby与agg

df.groupby('type').agg(dict(estimated='any', value='sum')).reset_index()

     type  value estimated
0  type_a      6      True
1  type_b     15     False

通过

1 个答案: