参考其他列值计算行差异但跳过NA

时间:2014-10-31 14:52:36

标签: r data.table

我想参考其他列中的值来执行差异计算。这是一个令人满意的标题,但在这个简化的案例中,我想要表现的应该是简单的:

sample = rep(c("A", "B"), each=7)
distance = c(25, 75, 150, 200, 250, 350, 450, 25, 75, 150, 200, 250, 350, 450)
y = c(NA, NA, 2, 3, 4, NA, 3, NA, NA, 2, 3, 3, NA, NA)
library(data.table)
dt <- data.table(sample, distance, y)
dt[, sample := factor(sample)]

现在,我想计算距离之间的差异:

dt[, delta := ifelse(distance == min(distance),
                     min(distance),
                     c(0, diff(distance))
                     ),
   by = sample]

这是对的。但是,如果'y'列值为NA,我想跳过行。所以,我做了这个改变:

dt[!(is.na(y)), 
   delta2 := ifelse(distance == min(distance),
                   min(distance),
                   c(0, diff(distance))
                   ),
   by = sample
   ]

我认为这就是我想要的,但是......

dt[, list(distance, y, delta, delta2), by = sample ]

结果为:

    sample distance  y delta delta2
 1:      A       25 NA    25     NA
 2:      A       75 NA    50     NA
 3:      A      150  2    75    150
 4:      A      200  3    50     50
 5:      A      250  4    50     50
 6:      A      350 NA   100     NA
 7:      A      450  3   100    200
 8:      B       25 NA    25     NA
 9:      B       75 NA    50     NA
10:      B      150  2    75    150
11:      B      200  3    50     50
12:      B      250  3    50     50
13:      B      350 NA   100     NA
14:      B      450 NA   100     NA

如果y的{​​{1}}值是最大距离的NA,则它还会为max(distance)提供NA。我希望它将其计算为delta2而不是给出NA,即计算到前一个非NA的最大值(距离)。我想知道如何进/出这个?

0 个答案:

没有答案