我一直在努力有条件地计算多个行/列之间的时间差。 我想通过id计算最初的否定结果和随后的肯定结果之间的时间差。我一直在dplyr中尝试这样做,但也许我需要另一种方法。
数据代码:
id <- c(1,1,1,1,2,2,2,2,3,3,3,3)
firstnegative <-c('T', 'F','F','F', 'F','F','F','F','T','F','F','F')
organism <- c('neg', 'COVID', 'COVID', 'neg', 'COVID', 'neg', 'neg', 'neg', 'neg', 'neg', 'COVID', 'COVID')
date <- seq(as.Date("2020/3/1"), as.Date('2020/3/12'), "days")
data <- data.frame (id,date, organism, firstnegative)
id date organism firstnegative
1 2020-03-01 neg T
1 2020-03-02 COVID F
1 2020-03-03 COVID F
1 2020-03-04 neg F
2 2020-03-05 COVID F
2 2020-03-06 neg F
2 2020-03-07 neg F
2 2020-03-08 neg F
3 2020-03-09 neg T
3 2020-03-10 neg F
3 2020-03-11 COVID F
3 2020-03-12 COVID F
Expected Result
id date organism firstnegative timediff
1 2020-03-01 neg T 1d
1 2020-03-02 COVID F
1 2020-03-03 COVID F
1 2020-03-04 neg F
2 2020-03-05 COVID F
2 2020-03-06 neg F
2 2020-03-07 neg F
2 2020-03-08 neg F
3 2020-03-09 neg T 2d
3 2020-03-10 neg F
3 2020-03-11 COVID F
3 2020-03-12 COVID F
第一个负数和随后的正数之间的长度未知,并且会发生变化。我不能认为领先优势是1。
任何想法/方法将不胜感激。
答案 0 :(得分:1)
对于每个id
,您可以找到'COVID'
的第一个日期,并在'T'
中首次出现firstnegative
时减去它。由于我们只需要第一行的值,因此我们用replace
NA
其余的值。
library(dplyr)
data %>%
group_by(id) %>%
mutate(timediff = date[match('COVID', organism)] -
date[match('T', firstnegative)],
timediff = as.numeric(replace(timediff, -1L, NA)))
# id date organism firstnegative timediff
# <dbl> <date> <chr> <chr> <dbl>
# 1 1 2020-03-01 neg T 1
# 2 1 2020-03-02 COVID F NA
# 3 1 2020-03-03 COVID F NA
# 4 1 2020-03-04 neg F NA
# 5 2 2020-03-05 COVID F NA
# 6 2 2020-03-06 neg F NA
# 7 2 2020-03-07 neg F NA
# 8 2 2020-03-08 neg F NA
# 9 3 2020-03-09 neg T 2
#10 3 2020-03-10 neg F NA
#11 3 2020-03-11 COVID F NA
#12 3 2020-03-12 COVID F NA