熊猫矢量化方法来创建大小为n的组?

时间:2019-06-05 13:32:28

标签: python pandas numpy vectorization

说我有一个(samples, timesteps, features)形状的大张量,但是我想将其展开以对Pandas执行groupby操作,如何在矢量化后相应地标记每个n:n + size个元素时尚?解决速度慢:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(0, 1, 500))
df["sample"] = np.nan

n_timesteps = 50
n_samples = len(df) // n_timesteps

size = n_timesteps
for i in range(n_samples):
    id0 = i * n_timesteps
    id1 = i * n_timesteps + n_timesteps
    df.loc[id0:id1, "sample"] = i

1 个答案:

答案 0 :(得分:2)

index按整数除法分配新列:

#default RangeIndex
df['sample'] = df.index // n_timesteps

或由arange创建的一维numpy数组:

df['sample'] = np.arange(len(df)) // n_timesteps