Question

我有一个包含A列和B列的数据框，如下所示。我想在滑动窗口中计算B列中值的平均值。滑动窗口大小不是常量，应根据A列设置。即窗口大小在A列中设置为值限制为200.下面的示例给出了窗口大小的清晰描述：

A:        10   150    200   220    300    350    400    410    500                                          
B:         0     0      0     1     0      1     1      1       0               mean                 
          [0     0    0]                                                        0
                 [0     0     1     0      1]                                   0.4
                        [0    1     0      1      1]                            0.6
                              [1    0      1      1     1]                      0.8
                                    [0     1     1      1      0]               0.6
                                           [1     1      1     0]               0.75
                                                  [1     1     0]               0.66
                                                        [1     0]               0.5
                                                               [0]              0


 Output:      0    0.4    0.6  0.8   0.8    0.8    0.8   0.8  0.75

现在，对于A列中的每一行/坐标，都会考虑包含坐标的所有窗口，并且应保留最高的平均值，从而得到“输出”列中显示的结果。

我希望输出如上所示。输出应该像：

A                    B                  Output   
10                   0                      0  
150                  0                      0.4
200                  0                      0.6
220                  1                      0.8
300                  0                      0.8
350                  1                      0.8
400                  1                      0.8
410                  1                      0.8
500                  0                      0.75

在Sliding window in R和

上有类似的问题

rollapply(B, 2*k-1, function(x) max(rollmean(x, k)), partial = TRUE)

给出解决方案，其中k为窗口大小。不同的是窗口大小在当前问题中不是恒定的。

有人能够在R中提供任何解决方案吗？

Answer 1

可重复的数据：

data <- data.frame(
  A = c(10, 150, 200, 220, 300, 350, 400, 410, 500) , 
  B = c(0, 0, 0, 1, 0, 1, 1, 1, 0)  
)

window_size <- 200

只需使用vapply或sapply循环A的值，然后计算B的合适子集的平均值。

data$Output <- with(
  data,
  vapply(
    A, 
    function(x) 
    {
      index <- x <= A & A <= x + window_size
      mean(B[index])
    },
    numeric(1)
  )
)

Answer 2

试试这个：

a=c(10,150,200,250,300,350,400)
b=c(0,0,0,1,1,1,0)

mean=rep(0,length(a))
window=200
for(i in 1:length(a)){
    vals=which(a>=a[i] & a<=a[i]+window)
    mean[i]=sum(b[vals])/length(vals)
}

Answer 3

这似乎有效：

#data
DF <- data.frame(A = c(10, 150, 200, 220, 300, 350, 400, 410, 500),
                 B = c(0, 0, 0, 1, 0, 1, 1, 1, 0))

#size of the different windows
rolls <- findInterval(DF$A + 200, DF$A)

#find the mean for every interval
fun <- function(from, to) { mean(DF$B[from:to]) } 
means <- mapply(fun, 1:nrow(DF), rolls)

#in which windows is every value of DF$A
fun2 <- function(x, from, to) { x %in% from:to } 

output <- rep(NA, nrow(DF))
for(i in 1:nrow(DF))
 {
  output[i] <- max(means[mapply(fun2, i, 1:nrow(DF), rolls)])
 }

DF$output <- output

>  DF
    A B output
1  10 0   0.00
2 150 0   0.40
3 200 0   0.60
4 220 1   0.80
5 300 0   0.80
6 350 1   0.80
7 400 1   0.80
8 410 1   0.80
9 500 0   0.75

R中的滑动窗口，用于不同的窗口大小

3 个答案: