Question

我有一个数据框，其中有许多科目完成了多次试验（1：800），我想添加一个“块”列......每个块有80个试验。我觉得rolling_apply可能是解决方案，但我似乎无法使其发挥作用。

我可以做某种事情，两个增量之间的“试验”的每个值都设置在某个块，但看起来像rolling_apply应该这样做。我真的很想这样做，但这是不可能的：

grouped = df.groupby('sid')
df['block'] = pd.rolling_cumcount(grouped, 80)

有10个街区和800个试验...

blocks = range(10)
increments = range(800)[0::80]

我有这个：

我最终想要的是（只有我有10个街区，每个街区有80个试验）：

SID  Trial  Block
1    0      0
1    1      0
1    2      0
1    3      1
1    4      1
1    5      1
2    0      0
2    1      0
2    2      0
2    3      1
2    4      1
2    5      1

由于

我最终只是这样做，这可能不是最佳解决方案，但工作正常：

# add a block to each subject
block = np.arange(1,11)
block_array = np.repeat(block, 80)
blocks_all = np.tile(block_array, df['sid'].nunique())
df['block'] = blocks_all

Answer 1

我会使用groupby的cumcount：

In [11]: g = df.groupby(['SID', 'Trial'])

In [12]: g.cumcount()
Out[12]: 
0     0
1     0
2     0
3     1
4     1
5     1
6     0
7     0
8     0
9     1
10    1
11    1
dtype: int64

然后将其设置为列：

In [13]: df['Block'] = df.groupby(['SID', 'Trial']).cumcount()

我想要rolling_apply吗？

1 个答案: