Question

如何将熊猫ArrayBuffer + rolling仅应用于选定的行？

apply

但是在处理非常大的数据集时这不是一个选择，并且仅需要滚动值的子集。是否可以执行某种df = pd.DataFrame({'A':range(10)}) # We want the rolling mean values at rows [4,8] rows_to_select = [4,8] # We can calculate rolling values of all rows first, then do the selections roll_mean = df.A.rolling(3).mean() result = roll_mean[rows_to_select] + rolling + selection？

Answer 1

使用滑动窗口视图

我们可以创建滑动窗口作为输入序列的视图，以给自己一个2D数组，然后简单地用选定的行对其进行索引，并沿着该2D数组的第二个轴计算平均值。这就是所需的输出，而且全部都是矢量化方式。

要获得这些滑动窗口，skimage中有一个简单的内置函数。我们将利用它。

实施方式为-

from skimage.util.shape import view_as_windows

W = 3 # window length

# Get sliding windows
w = view_as_windows(df['A'].to_numpy(copy=False),W)

# Get selected rows of slding windows. Get mean value.
out_ar = w[np.asarray(rows_to_select)-W+1].mean(1)

# Output as series if we need in that format
out_s = pd.Series(out_ar,index=df.index[rows_to_select])

要保留在NumPy中，可以替代view_as_windows是strided_app-

w = strided_app(df['A'].to_numpy(copy=False),L=W,S=1)

扩展到所有还原操作

所有支持归约运算的NumPy函数都可以扩展为使用此方法，就像这样-

def rolling_selected_rows(s, rows, W, func):
    # Get sliding windows
    w = view_as_windows(s.to_numpy(copy=False),W)

    # Get selected rows of slding windows. Get mean value.
    out_ar = func(w[np.asarray(rows)-W+1],axis=1)

    # Output as series if we need in that format
    out_s = pd.Series(out_ar,index=s.index[rows])
    return out_s

因此，要获得给定样本的所选行的滚动min值，它将是-

In [91]: rolling_selected_rows(df['A'], rows_to_select, W=3, func=np.min)
Out[91]: 
4    2
8    6
dtype: int64

Answer 2

我觉得您可以使用for循环，正如您提到的，当数据框很大时，如果我们只需要几个值，那么对整个数据框运行就没有好处，尤其是您需要考虑滚动作为内存成本函数。

n=3
l=[df.loc[x-n+1:x].mean()[0]for x in rows_to_select]
l
[3.0, 7.0]

熊猫滚动应用于选定的行

2 个答案:

使用滑动窗口视图