我想参考其他列中的值来执行差异计算。这是一个令人满意的标题,但在这个简化的案例中,我想要表现的应该是简单的:
sample = rep(c("A", "B"), each=7)
distance = c(25, 75, 150, 200, 250, 350, 450, 25, 75, 150, 200, 250, 350, 450)
y = c(NA, NA, 2, 3, 4, NA, 3, NA, NA, 2, 3, 3, NA, NA)
library(data.table)
dt <- data.table(sample, distance, y)
dt[, sample := factor(sample)]
现在,我想计算距离之间的差异:
dt[, delta := ifelse(distance == min(distance),
min(distance),
c(0, diff(distance))
),
by = sample]
这是对的。但是,如果'y'列值为NA,我想跳过行。所以,我做了这个改变:
dt[!(is.na(y)),
delta2 := ifelse(distance == min(distance),
min(distance),
c(0, diff(distance))
),
by = sample
]
我认为这就是我想要的,但是......
dt[, list(distance, y, delta, delta2), by = sample ]
结果为:
sample distance y delta delta2
1: A 25 NA 25 NA
2: A 75 NA 50 NA
3: A 150 2 75 150
4: A 200 3 50 50
5: A 250 4 50 50
6: A 350 NA 100 NA
7: A 450 3 100 200
8: B 25 NA 25 NA
9: B 75 NA 50 NA
10: B 150 2 75 150
11: B 200 3 50 50
12: B 250 3 50 50
13: B 350 NA 100 NA
14: B 450 NA 100 NA
如果y
的{{1}}值是最大距离的NA,则它还会为max(distance)
提供NA。我希望它将其计算为delta2
而不是给出NA,即计算到前一个非NA的最大值(距离)。我想知道如何进/出这个?