我想在pandas中做一些滚动窗口计算,需要同时处理两列。我将采用一个简单的实例来清楚地表达问题:
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})
windowSize = 4
result = []
for i in range(1, len(df)+1):
if i < windowSize:
result.append(None)
else:
x = df.x.iloc[i-windowSize:i]
y = df.y.iloc[i-windowSize:i]
m = y.mean()
r = sum(x[y > m]) / sum(x[y <= m])
result.append(r)
print(result)
在pandas中没有任何方法可以解决问题吗?任何帮助表示赞赏
答案 0 :(得分:2)
您可以使用rolling window trick for numpy arrays并将其应用于DataFrame底层的数组。
import pandas as pd
import numpy as np
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
df = pd.DataFrame({
'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})
windowSize = 4
rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b
结果缺少前导None
值,但它们应该很容易追加(以np.nan
的形式或将结果转换为列表后)。
这可能不是你正在寻找的,使用熊猫,但它将在没有循环的情况下完成工作。
答案 1 :(得分:1)
这是使用package com.example.rtrjs.abc;
import android.content.Intent;
import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.widget.Button;
import android.widget.EditText;
import android.view.View.OnClickListener;
public class Signup extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_signup);
getSupportActionBar().setTitle("Sign up");
}
EditText edittext=(EditText)findViewById(R.id.etfn);
Button buton=(Button)findViewById(R.id.breg);
buton.setOnClickListener(new OnClickListener())
}
工具的一种矢量化方法 -
NumPy
windowSize = 4
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],windowSize,1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums
取自here
。
运行时测试 -
方法 -
strided_app
计时 -
# @kazemakase's solution
def rolling_window_sum(df, windowSize=4):
rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b
return result
# Proposed in this post
def strided_einsum(df, windowSize=4):
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],windowSize,1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums
return out
为了提高性能,我们可以计算In [46]: df = pd.DataFrame(np.random.randint(0,9,(1000000,2)))
In [47]: %timeit rolling_window_sum(df)
10 loops, best of 3: 90.4 ms per loop
In [48]: %timeit strided_einsum(df)
10 loops, best of 3: 62.2 ms per loop
部分,这基本上是Scipy's 1D uniform filter
的窗口总和。因此,Y.mean(1)
可以替换为M
计算为 -
windowSize=4
性能提升非常显着 -
from scipy.ndimage.filters import uniform_filter1d as unif1d
M = unif1d(a[:,1].astype(float),windowSize)[2:-1]