如何访问滚动运算符中的多列?

时间:2017-04-26 14:21:15

标签: python pandas numpy vectorization

我想在pandas中做一些滚动窗口计算,需要同时处理两列。我将采用一个简单的实例来清楚地表达问题:

import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
    'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})

windowSize = 4
result = []

for i in range(1, len(df)+1):
    if i < windowSize:
        result.append(None)
    else:
        x = df.x.iloc[i-windowSize:i]
        y = df.y.iloc[i-windowSize:i]
        m = y.mean()
        r = sum(x[y > m]) / sum(x[y <= m])
        result.append(r)

print(result)

在pandas中没有任何方法可以解决问题吗?任何帮助表示赞赏

2 个答案:

答案 0 :(得分:2)

您可以使用rolling window trick for numpy arrays并将其应用于DataFrame底层的数组。

import pandas as pd
import numpy as np

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

df = pd.DataFrame({
    'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
    'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})

windowSize = 4    

rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b

结果缺少前导None值,但它们应该很容易追加(以np.nan的形式或将结果转换为列表后)。

这可能不是你正在寻找的,使用熊猫,但它将在没有循环的情况下完成工作。

答案 1 :(得分:1)

这是使用package com.example.rtrjs.abc; import android.content.Intent; import android.os.Bundle; import android.support.v7.app.AppCompatActivity; import android.view.View; import android.widget.Button; import android.widget.EditText; import android.view.View.OnClickListener; public class Signup extends AppCompatActivity { @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_signup); getSupportActionBar().setTitle("Sign up"); } EditText edittext=(EditText)findViewById(R.id.etfn); Button buton=(Button)findViewById(R.id.breg); buton.setOnClickListener(new OnClickListener()) } 工具的一种矢量化方法 -

NumPy

windowSize = 4 a = df.values X = strided_app(a[:,0],windowSize,1) Y = strided_app(a[:,1],windowSize,1) M = Y.mean(1) mask = Y>M[:,None] sums = np.einsum('ij,ij->i',X,mask) rest_sums = X.sum(1) - sums out = sums/rest_sums 取自here

运行时测试 -

方法 -

strided_app

计时 -

# @kazemakase's solution
def rolling_window_sum(df, windowSize=4):
    rw = rolling_window(df.values.T, windowSize)
    m = np.mean(rw[1], axis=-1, keepdims=True)
    a = np.sum(rw[0] * (rw[1] > m), axis=-1)
    b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
    result = a / b
    return result    

# Proposed in this post    
def strided_einsum(df, windowSize=4):
    a = df.values
    X = strided_app(a[:,0],windowSize,1)
    Y = strided_app(a[:,1],windowSize,1)
    M = Y.mean(1)
    mask = Y>M[:,None]
    sums = np.einsum('ij,ij->i',X,mask)
    rest_sums = X.sum(1) - sums
    out = sums/rest_sums
    return out

为了提高性能,我们可以计算In [46]: df = pd.DataFrame(np.random.randint(0,9,(1000000,2))) In [47]: %timeit rolling_window_sum(df) 10 loops, best of 3: 90.4 ms per loop In [48]: %timeit strided_einsum(df) 10 loops, best of 3: 62.2 ms per loop 部分,这基本上是Scipy's 1D uniform filter的窗口总和。因此,Y.mean(1)可以替换为M计算为 -

windowSize=4

性能提升非常显着 -

from scipy.ndimage.filters import uniform_filter1d as unif1d

M = unif1d(a[:,1].astype(float),windowSize)[2:-1]