我有一个数据集df
:
df <- tibble(
id = sort(rep(letters[1:3], 3)),
visit_id = rep(c(0, 5, 10), 3),
true_visit = c(NA, 3, NA, 0, 5, 10, 1, 7, NA)
)
> df
# A tibble: 9 x 3
id visit_id true_visit
<chr> <dbl> <dbl>
1 a 0 NA
2 a 5 3
3 a 10 NA
4 b 0 0
5 b 5 5
6 b 10 10
7 c 0 1
8 c 5 7
9 c 10 NA
我正在尝试创建一个新列closest_visit
,在其中我发现每个人中最接近true_visit
的{{1}}。结果如下:
visit_id
为明确起见,# A tibble: 9 x 4
id visit_id true_visit closest_visit
<chr> <dbl> <dbl> <dbl>
1 a 0 NA 3
2 a 5 3 3
3 a 10 NA 3
4 b 0 0 0
5 b 5 5 5
6 b 10 10 10
7 c 0 1 1
8 c 5 7 7
9 c 10 NA 7
对于单个closest_visit
是3,因为它是唯一的a
。 true_visit
对于第七行是1,因为0(该行的closest_visit
)比1更接近于1(该参与者的visit_id
),依此类推。
答案 0 :(得分:2)
可以去:
library(dplyr)
df %>%
group_by(id) %>%
mutate(
closest_visit = case_when(
visit_id == true_visit ~ true_visit,
TRUE ~ true_visit[sapply(visit_id,
function(x) which.min(abs(x - true_visit)))]
)
)
输出:
# A tibble: 9 x 4
# Groups: id [3]
id visit_id true_visit closest_visit
<chr> <dbl> <dbl> <dbl>
1 a 0 NA 3
2 a 5 3 3
3 a 10 NA 3
4 b 0 0 0
5 b 5 5 5
6 b 10 10 10
7 c 0 1 1
8 c 5 7 7
9 c 10 NA 7
答案 1 :(得分:1)
一个选项是findInterval
,然后是fill
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(closest_visit = na.omit(true_visit)[findInterval(true_visit,
visit_id)]) %>%
fill(closest_visit, .direction = "updown")
# A tibble: 9 x 4
# Groups: id [3]
# id visit_id true_visit closest_visit
# <chr> <dbl> <dbl> <dbl>
#1 a 0 NA 3
#2 a 5 3 3
#3 a 10 NA 3
#4 b 0 0 0
#5 b 5 5 5
#6 b 10 10 10
#7 c 0 1 1
#8 c 5 7 7
#9 c 10 NA 7
答案 2 :(得分:1)
这不是最漂亮的方法,但它适用于您的示例:
library(dplyr)
for (id in unique(df$id) ) {
available_visit = na.omit(df[df$id == id ,'true_visit']) %>% pull()
unique_id = unique(df$visit_id[df$id == id])
for (visit_id in unique_id) {
df[df$id == id & df$visit_id == visit_id, 'closest_visit' ] <-
available_visit[which.min(abs(available_visit-visit_id))]
}
}