我正在尝试获取该序列之前所有NA之后的日期序列中的最小日期,而该序列之后的唯一值要么是NA,要么该日期序列是最后一列。
这可以通过示例更好地解释:
sample <- data.frame(subject = c("A","B","C"),Date1 = c("1-2-19","1-2-19",NA),Date2 = c("1-3-19",NA,"1-3-19"),Date3 = c("1-4-19","1-4-19",NA)
,Date4 = c(NA,"1-5-19",NA),Date5 = c("1-6-19",NA,NA),Date6 = c("1-7-19",NA,"1-7-19"))
输出:
subject Date1 Date2 Date3 Date4 Date5 Date6
1 A 1-2-19 1-3-19 1-4-19 <NA> 1-6-19 1-7-19
2 B 1-2-19 <NA> 1-4-19 1-5-19 <NA> <NA>
3 C <NA> 1-3-19 <NA> <NA> <NA> 1-7-19
期望的结果是有一个名为Minimum_Date的附加列,在其中输入每行的预期结果。
因此主题A将返回“ 1-6-19”
对象B返回“ 1-4-19”
主题C将返回“ 1-7-19”
答案 0 :(得分:3)
这是base R
sample$minDate <- apply(sample[-1], 1, function(x) {
i1 <- which(!is.na(x))
mx <- cumsum(c(TRUE, diff(i1) != 1))
x1 <- x[i1[mx == max(mx)]]
x1[which.min(as.Date(x1, "%m-%d-%y"))]})
sample$minDate
#[1] "1-6-19" "1-4-19" "1-7-19"
答案 1 :(得分:2)
这是一种tidyverse
方法,可以转换为长格式以实现所需的输出。
library(tidyverse)
sample <- data.frame(subject = c("A","B","C"),Date1 = c("1-2-19","1-2-19",NA),Date2 = c("1-3-19",NA,"1-3-19"),Date3 = c("1-4-19","1-4-19",NA)
,Date4 = c(NA,"1-5-19",NA),Date5 = c("1-6-19",NA,NA),Date6 = c("1-7-19",NA,"1-7-19"))
sample %>%
gather(date_num, date, -subject) %>% #reshape longer
mutate(date = lubridate::mdy(date)) %>% # convert to date so min works
arrange(subject, desc(date_num)) %>% # sort in reverse order
group_by(subject) %>%
mutate(after_na = cumsum(is.na(date))) %>% # create indicator for how many NAs have appeared
filter(!is.na(date)) %>% # deal with rows that end in NA
filter(after_na == min(after_na)) %>% # restrict to the last sequence
summarise(Minimum_Date = min(date)) %>% # get min of those dates in last sequence
inner_join(sample) # join onto original table
#> # A tibble: 3 x 8
#> subject Minimum_Date Date1 Date2 Date3 Date4 Date5 Date6
#> <fct> <date> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 A 2019-01-06 1-2-19 1-3-19 1-4-19 <NA> 1-6-19 1-7-19
#> 2 B 2019-01-04 1-2-19 <NA> 1-4-19 1-5-19 <NA> <NA>
#> 3 C 2019-01-07 <NA> 1-3-19 <NA> <NA> <NA> 1-7-19
由reprex package(v0.3.0)于2019-06-18创建