有条件地计算多行之间的差异

时间:2020-10-17 09:59:31

标签: r dplyr tidyr

我一直在努力有条件地计算多个行/列之间的时间差。 我想通过id计算最初的否定结果和随后的肯定结果之间的时间差。我一直在dplyr中尝试这样做,但也许我需要另一种方法。

数据代码:

id <- c(1,1,1,1,2,2,2,2,3,3,3,3)
firstnegative <-c('T', 'F','F','F', 'F','F','F','F','T','F','F','F') 
organism <- c('neg', 'COVID', 'COVID', 'neg', 'COVID', 'neg', 'neg', 'neg', 'neg', 'neg', 'COVID', 'COVID')
date <- seq(as.Date("2020/3/1"), as.Date('2020/3/12'), "days")
data <- data.frame (id,date, organism, firstnegative)

 id       date organism  firstnegative
  1 2020-03-01      neg             T
  1 2020-03-02    COVID             F
  1 2020-03-03    COVID             F
  1 2020-03-04      neg             F
  2 2020-03-05    COVID             F
  2 2020-03-06      neg             F
  2 2020-03-07      neg             F
  2 2020-03-08      neg             F
  3 2020-03-09      neg             T
  3 2020-03-10      neg             F
  3 2020-03-11    COVID             F
  3 2020-03-12    COVID             F

Expected Result

 id       date organism  firstnegative   timediff
  1 2020-03-01      neg             T      1d
  1 2020-03-02    COVID             F
  1 2020-03-03    COVID             F
  1 2020-03-04      neg             F
  2 2020-03-05    COVID             F
  2 2020-03-06      neg             F
  2 2020-03-07      neg             F
  2 2020-03-08      neg             F
  3 2020-03-09      neg             T      2d
  3 2020-03-10      neg             F
  3 2020-03-11    COVID             F
  3 2020-03-12    COVID             F

第一个负数和随后的正数之间的长度未知,并且会发生变化。我不能认为领先优势是1。
任何想法/方法将不胜感激。

1 个答案:

答案 0 :(得分:1)

对于每个id,您可以找到'COVID'的第一个日期,并在'T'中首次出现firstnegative时减去它。由于我们只需要第一行的值,因此我们用replace NA其余的值。

library(dplyr)

data %>%
  group_by(id) %>%
  mutate(timediff = date[match('COVID', organism)] - 
                    date[match('T', firstnegative)],
         timediff = as.numeric(replace(timediff, -1L, NA)))

#      id date       organism firstnegative timediff
#   <dbl> <date>     <chr>    <chr>            <dbl>
# 1     1 2020-03-01 neg      T                    1
# 2     1 2020-03-02 COVID    F                   NA
# 3     1 2020-03-03 COVID    F                   NA
# 4     1 2020-03-04 neg      F                   NA
# 5     2 2020-03-05 COVID    F                   NA
# 6     2 2020-03-06 neg      F                   NA
# 7     2 2020-03-07 neg      F                   NA
# 8     2 2020-03-08 neg      F                   NA
# 9     3 2020-03-09 neg      T                    2
#10     3 2020-03-10 neg      F                   NA
#11     3 2020-03-11 COVID    F                   NA
#12     3 2020-03-12 COVID    F                   NA