我有一组变量,其中包含一个人曾是否有某些健康状况的数据。例如,“你曾经心脏病发作吗?”
如果他们在观察2中说“是”,那么在观察3和4时答案仍然是肯定的。但是,在观察1时不一定是。心脏病发作可能发生在观察1和2之间。
如果他们在观察2中说“不”,那么在观察1时答案是否定的。但是,在观察3或4时不一定没有。
这是一个可重复的例子:
df <- tibble(
id = rep(1:3, each = 4),
obs = rep(1:4, times = 3),
mi_ever = c(NA, 0, 1, NA, NA, 0, NA, NA, NA, 1, NA, NA)
)
df
id obs mi_ever
1 1 1 NA
2 1 2 0
3 1 3 1
4 1 4 NA
5 2 1 NA
6 2 2 0
7 2 3 NA
8 2 4 NA
9 3 1 NA
10 3 2 1
11 3 3 NA
12 3 4 NA
使用zoo :: na.locf向后移动或向前移动我的1(是的)前进是微不足道的。但是,我不确定如何向前移动和 1的0。理想情况下,我希望得到以下结果:
id obs mi_ever mi_ever_2
1 1 1 NA 0
2 1 2 0 0
3 1 3 1 1
4 1 4 NA 1
5 2 1 NA 0
6 2 2 0 0
7 2 3 NA NA
8 2 4 NA NA
9 3 1 NA NA
10 3 2 1 1
11 3 3 NA 1
12 3 4 NA 1
我已经查看了以下帖子,但似乎没有一个完全覆盖我在这里要求的内容。
Carry last Factor observation forward and backward in group of rows in R
Forward and backward fill data frame in R
making a "dropdown" function in R
感谢任何帮助。
答案 0 :(得分:2)
基本上我在第一个1成为1之后依次标记项目,而在最后一个0之前成为0。
ever <- function (x) min( which( x == 1))
NA_1 <- function(x) seq_along(x) > ever(x) #could have done in one function
# check to see if working
ave(df$mi_ever, df$id, FUN= function(x){ x[NA_1(x) ] <- 1; x})
[1] NA 0 1 1 NA 0 NA NA NA 1 1 1
NA_0 <- function(x) seq_along(x) < not_yet(x)
not_yet <- function(x){ max( which( x==0)) }
# make temporary version of 1-modified column
temp1 <- ave(df$mi_ever, df$id, FUN= function(x){ x[NA_1(x) ] <- 1; x})
df$ever2 <- ave(temp1, df$id, FUN= function(x){ x[NA_0(x) ] <- 0; x})
# then make final version; could have done it "in place" I suppose.
df
# A tibble: 12 x 4
id obs mi_ever ever2
<int> <int> <dbl> <dbl>
1 1 1 NA 0
2 1 2 0 0
3 1 3 1 1
4 1 4 NA 1
5 2 1 NA 0
6 2 2 0 0
7 2 3 NA NA
8 2 4 NA NA
9 3 1 NA NA
10 3 2 1 1
11 3 3 NA 1
12 3 4 NA 1
如果您需要取消可能出现的警告。
答案 1 :(得分:0)
我从上面的@ 42-中得到了答案(谢谢!),并稍微调整一下以进一步满足我的需求。具体来说,我:
check_logic
参数。如果为TRUE,如果在1之后出现0,则该函数将返回9。这表示数据错误或逻辑缺陷需要进一步调查。功能:
distribute_ever <- function(x, check_logic = TRUE, ...) {
if (check_logic) {
if (length(which(x == 1)) > 0 & length(which(x == 0)) > 0) {
if (min(which(x == 1)) < max(which(x == 0))) {
x <- 9 # Set x to 9 if zero comes after 1
}
}
}
ones <- which(x == 1) # Get indices for 1's
if (length(ones) > 0) { # Prevents warning
first_1_by_group <- min(which(x == 1)) # Index first 1 by group
x[seq_along(x) > first_1_by_group] <- 1 # Set x at subsequent indices to 1
}
zeros <- which(x == 0) # Get indices for 0's
if (length(zeros) > 0) { # Prevents warning
last_0_by_group <- max(which(x == 0)) # Index last 0 by group
x[seq_along(x) < last_0_by_group] <- 0 # Set x at previous indices to 0
}
x
}
一个新的可复制的例子,有多个&#34;永远&#34;变量和一些在1之后为0的情况:
dt <- data.table(
id = rep(1:3, each = 4),
obs = rep(1:4, times = 3),
mi_ever = c(NA, 0, 1, NA, NA, 0, NA, NA, NA, 1, NA, NA),
diab_ever = c(0, NA, NA, 1, 1, NA, NA, 0, 0, NA, NA, NA)
)
使用data.table(通过分组处理)快速迭代多个变量:
ever_vars <- c("mi_ever", "diab_ever")
dt[, paste0(ever_vars, "_2") := lapply(.SD, distribute_ever),
.SDcols = ever_vars,
by = id][]
结果:
id obs mi_ever diab_ever mi_ever_2 diab_ever_2
1: 1 1 NA 0 0 0
2: 1 2 0 NA 0 NA
3: 1 3 1 NA 1 NA
4: 1 4 NA 1 1 1
5: 2 1 NA 1 0 9
6: 2 2 0 NA 0 9
7: 2 3 NA NA NA 9
8: 2 4 NA 0 NA 9
9: 3 1 NA 0 NA 0
10: 3 2 1 NA 1 NA
11: 3 3 NA NA 1 NA
12: 3 4 NA NA 1 NA
对于每个输入&#34;永远&#34;变量,我们有:
check_logic
设置为FALSE,那么1会胜出并替换0&#39;