如何在数据表中某列的每一行以其他行作为输入应用函数?

时间:2019-01-18 03:10:50

标签: r list data.table string-length rollapply

对于“ Response”列的每一行,我想检查其下面的5行是否具有“ Response”值(即没有NA),如果有,那么我想计算均值和标准差在下面的5行中如果下面的5行中有任何行缺少“响应”值(即NA),则最终输出应为“ NA”(因为我希望对n = 5点/值计算均值和标准差)。

Input.data的示例如下:

 Response     
        NA               
         1                 
         2                 
         3                
        NA        
         1         
         1         
         2         
         3         
         4         
         5    

这是我尝试的代码,没有给出正确的解决方案:

Input.data$count.lag <- rollapplyr(Input.data[,c("Response")],list(-(4:0)),length, fill=NA)

Input.data$stdev <- ifelse(Input.data$count.lag <5, "NA", 
                            rollapplyr(Input.data[,c("Response")],list(-(4:0)),sd,fill=NA))
Input.data$mean <- ifelse(Input.data$count.lag <5, "NA", 
                           rollapplyr(Input.data[,c("Response")],list(-(4:0)),mean,fill=NA))

它给出了以下内容,而不是我想要的:

 Response count.lag     stdev mean
       NA        NA        NA   NA
        1        NA        NA   NA
        2        NA        NA   NA
        3        NA        NA   NA
       NA         5        NA   NA
        1         5        NA   NA
        1         5        NA   NA
        2         5        NA   NA
        3         5        NA   NA
        4         5  1.303840  2.2
        5         5  1.581139  3.0

这应该是输出:

Response count.lag      stdev  mean
     NA         4        NA    NA
      1         4        NA    NA
      2         4        NA    NA
      3         4        NA    NA
     NA         5   1.303840   2.2
      1         5   1.581139   3.0
      1         5   1.581139   4.0
      2         5   1.581139   5.0
      3         5   1.581139   6.0
      4         5   1.581139   7.0
      5         5   1.581139   8.0

有人可以建议错误在哪里和/或可行的替代解决方案吗?谢谢!

1 个答案:

答案 0 :(得分:1)

一种可能的方法:

Input[, c("count.lag","stdev","mean") := 
    transpose(lapply(1L:.N, function(n) {
        x <- Response[(n+1L):min(n+5L, .N)]
        c(sum(!is.na(x)), sd(x), mean(x))
    }))]

输出:

    Response count.lag     stdev mean
 1:       NA         4        NA   NA
 2:        1         4        NA   NA
 3:        2         4        NA   NA
 4:        3         4        NA   NA
 5:       NA         5 1.3038405  2.2
 6:        1         5 1.5811388  3.0
 7:        1         5 1.5811388  4.0
 8:        2         5 1.5811388  5.0
 9:        3         5 1.5811388  6.0
10:        4         5 1.5811388  7.0
11:        5         5 1.5811388  8.0
12:        6         4 1.2909944  8.5
13:        7         3 1.0000000  9.0
14:        8         2 0.7071068  9.5
15:        9         1        NA 10.0
16:       10         1        NA   NA

数据:

Input <- fread("Response     
NA               
1                 
2                 
3                
NA        
1         
1         
2         
3         
4         
5
6
7
8
9
10")

edit:或者按照MichaelChirico的建议,使用shift。结束值不同,取决于OP希望如何处理结束值。

#requires data.table version >= 1.12.0 to use negative shifts (else use type='lag' with positive integers
Input[, c("count.lag", "stdev", "mean") := 
    .SD[, shift(Response, -1L:-5L)][, 
        .(apply(.SD, 1L, function(x) sum(!is.na(x))), 
            apply(.SD, 1L, sd), 
            apply(.SD, 1L, mean))]
]

输出:

    Response count.lag    stdev mean
 1:       NA         4       NA   NA
 2:        1         4       NA   NA
 3:        2         4       NA   NA
 4:        3         4       NA   NA
 5:       NA         5 1.303840  2.2
 6:        1         5 1.581139  3.0
 7:        1         5 1.581139  4.0
 8:        2         5 1.581139  5.0
 9:        3         5 1.581139  6.0
10:        4         5 1.581139  7.0
11:        5         5 1.581139  8.0
12:        6         4       NA   NA
13:        7         3       NA   NA
14:        8         2       NA   NA
15:        9         1       NA   NA
16:       10         0       NA   NA