目标：

Question

我正在尝试使用列表对行进行分组，作为在 Pandas 中分组的一种方式。

目标：

我想从数据框中对 N 行 进行分组 - 所以我采用了 groupby 将列表作为输入并按该顺序对行进行分组的方法。在解决问题之前，让我向您展示我用于对行进行分组的代码。

import math

df = pd.DataFrame(np.random.randint(0, 100, (100, 5)))

# Number or rows in group
n_elems = 20

# Total rows in the dataset
n_rows = df.shape[0]

# Groups to be created (Taking ceil to deal with even / odd number of rows)
n_groups = math.ceil(n_rows / n_elems)

groups = []
for idx in range(n_groups):
    grp = [idx] * n_elems
    groups.extend(grp)
    
# Making the same length - as groupby requires
groups = groups[:n_rows]

# Using list ↓ to group by
df.groupby(groups).agg(['mean', 'count'])

问题：

现在，在这种情况下 - 当我将每组的行数从 1 到 19 时，算法工作正常。 n_rows 为 1 时为 100 组，n_rows 为 2 时为 50 组，n_rows 为 5 时为 20 组，同样直到 19。

但是问题出现在数字 20。我不知道为什么是 20，它可能是基于其他行长度的其他数字，但是这里将 n_rows 设为 20，它应该返回 5 个组，每个组涉及 20 行。但它返回了 100 行但 0 列的奇怪的数据框！

我试图查找，但没有找到任何有用的信息。任何帮助都会让我更好地理解 groupby。

提前致谢。

Answer 1

尝试通过将 index 整除来创建群组：

n_elems = 2
new_df = df.groupby(df.index // n_elems).agg(['mean', 'sum'])

      0          1          2     
   mean  sum  mean  sum  mean  sum
0  57.5  115  75.5  151  34.5   69
1  71.0  142  17.0   34  53.0  106
2  21.0   42  48.5   97  78.5  157

使用的样本 DF：

import numpy as np
import pandas as pd

np.random.seed(5)
df = pd.DataFrame(np.random.randint(0, 100, (6, 3)))

df：

    0   1   2
0  99  78  61
1  16  73   8
2  62  27  30
3  80   7  76
4  15  53  80
5  27  44  77

与列表分组 - 在 df.groupby 中似乎不起作用

目标：

问题：

1 个答案: