Question

我目前正在尝试使用CUDA来评估代表指数移动平均滤波器的差分方程。滤波器由以下差分方程

描述

y[n] = y[n-1] * beta + alpha * x[n]

其中alpha和beta是定义为

的常量

alpha = (2.0 / (1 + Period))
beta = 1 - alpha

如何操纵上述差分方程以获得此滤波器的系统响应？在GPU上实现此过滤器的有效方法是什么？

我正在研发GTX 570.

Answer 1

我建议操纵如下所示的上述差分方程，然后使用CUDA Thrust原语。

差分方程操作 - 差分方程的显式形式

通过简单的代数，您可以找到以下内容：

y[1] = beta * y[0] + alpha * x[1]

y[2] = beta^2 * y[0] + alpha * beta * x[1] + alpha * x[2]

y[3] = beta^3 * y[0] + alpha * beta^2 * x[1] + alpha * beta * x[2] + alpha * x[3]

因此，明确的形式如下：

y[n] / beta^n = y[0] + alpha * x[1] / beta + alpha * x[2] / beta^2 + ...

CUDA THRUST IMPLEMENTATION

您可以通过以下步骤实现上述显式表单：

将输入序列d_input初始化为alpha，d_input[0] = 1.除外;
定义等于d_1_over_beta_to_the_n;

1, 1/beta, 1/beta^2, 1/beta^3, ...

按d_input;

d_1_over_beta_to_the_n

执行inclusive_scan以获取y[n] / beta^n;
将上述顺序除以1, 1/beta, 1/beta^2, 1/beta^3, ...。

修改

上述方法可推荐用于线性时变（LTV）系统。对于线性时不变（LTI）系统，可以推荐Paul提到的FFT方法。我在FIR filter in CUDA的答案中使用CUDA Thrust和cuFFT提供了这种方法的示例。

完整代码

#include <thrust/sequence.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> int main(void) { int N = 20; // --- Filter parameters double alpha = 2.7; double beta = -0.3; // --- Defining and initializing the input vector on the device thrust::device_vector<double> d_input(N,alpha * 1.); d_input[0] = d_input[0]/alpha; // --- Defining the output vector on the device thrust::device_vector<double> d_output(d_input); // --- Defining the {1/beta^n} sequence thrust::device_vector<double> d_1_over_beta(N,1./beta); thrust::device_vector<double> d_1_over_beta_to_the_n(N,1./beta); thrust::device_vector<double> d_n(N); thrust::sequence(d_n.begin(), d_n.end()); thrust::inclusive_scan(d_1_over_beta.begin(), d_1_over_beta.end(), d_1_over_beta_to_the_n.begin(), thrust::multiplies<double>()); thrust::transform(d_1_over_beta_to_the_n.begin(), d_1_over_beta_to_the_n.end(), d_input.begin(), d_input.begin(), thrust::multiplies<double>()); thrust::inclusive_scan(d_input.begin(), d_input.end(), d_output.begin(), thrust::plus<double>()); thrust::transform(d_output.begin(), d_output.end(), d_1_over_beta_to_the_n.begin(), d_output.begin(), thrust::divides<double>()); for (int i=0; i<N; i++) { double val = d_output[i]; printf("Device vector element number %i equal to %f\n",i,val); } // --- Defining and initializing the input vector on the host thrust::host_vector<double> h_input(N,1.); // --- Defining the output vector on the host thrust::host_vector<double> h_output(h_input); h_output[0] = h_input[0]; for(int i=1; i<N; i++) { h_output[i] = h_input[i] * alpha + beta * h_output[i-1]; } for (int i=0; i<N; i++) { double val = h_output[i]; printf("Host vector element number %i equal to %f\n",i,val); } for (int i=0; i<N; i++) { double val = h_output[i] - d_output[i]; printf("Difference between host and device vector element number %i equal to %f\n",i,val); } getchar(); }

Answer 2

对于另一种方法，您可以截断指数移动平均窗口，然后通过在信号和窗口指数之间进行卷积来计算滤波后的信号。卷积可以通过使用免费的CUDA FFT库（cuFFT）来计算，因为正如您所知，卷积可以表示为傅立叶域中两个信号的逐点乘法（这是恰当的名称卷积定理，它的复杂度为O（n log（n））。即使在GeForce 570上，这种方法也可以最大限度地减少CUDA内核代码，并且运行速度非常快。特别是如果你能以单（浮点）精度完成所有计算。

实现由CUDA中的差分方程描述的指数移动平均滤波器

2 个答案: