Question

假设我有以下示例：

我的原始数据集包含从VisitLink到Dis 3的变量。我想创建一个新的var new，以便在我按Patient对数据进行分组时，回顾一下患者访问前20天，检查Dis1是否为真当时那些访问。我想要的new将是：

我做了几次尝试，但他们都忽略了分组。

Patient DaysToEvent  Dis1  Dis2  Dis3   new
      1         130  TRUE FALSE FALSE  TRUE
      1         135 FALSE FALSE FALSE  TRUE
      2         456  TRUE  TRUE FALSE  TRUE
      2         500 FALSE FALSE FALSE  FALSE
      2         550  TRUE FALSE FALSE  TRUE
      2         560 FALSE  TRUE  TRUE  TRUE
      3         200 FALSE FALSE FALSE  FALSE
      3         400  TRUE  TRUE FALSE  TRUE
      3         410 FALSE  TRUE FALSE  TRUE
      3         510 FALSE FALSE FALSE  FALSE
      4           1  TRUE FALSE FALSE  TRUE
      4          20 FALSE  TRUE FALSE  TRUE
      4         110 FALSE FALSE FALSE  FALSE

谢谢！

Answer 1

1）创建一个函数gen_new，为每位患者填写m的缺失日期数字。然后，它使用带有rollapplyr的{{1}}来查找是否有任何尾随的20个或更少的元素为TRUE，然后使用any(..., na.rm = TRUE)将结果子集回原来的日期。要将其应用于所有患者，请使用window。 ave将强制ave生成的逻辑到0/1，因此将其输出与1进行比较以转换回逻辑。

gen_new

2）这个避免了（1）中的合并，因此可能更快。它定义了一个函数library(zoo) n <- nrow(DF) gen_new <- function(ix) with(DF[ix, ], { rng <- range(DaysToEvent) m <- merge(zoo(Dis1, DaysToEvent), zoo(, seq(rng[1], rng[2]))) window(rollapplyr(m, 20, any, na.rm = TRUE, partial = TRUE), DaysToEvent) }) DF <- transform(DF, new2 = ave(1:n, Patient, FUN = gen_new) == 1) # check that new and new2 are the same identical(DF$new, DF$new2) ## [1] TRUE，它接受逻辑zoo对象并确定在20的末尾是否有任何TRUE元素。然后它将Any定义为gen_new一个人。最后，它使用rollapplyr将其应用于每个人。

ave

注意：可重复形式的输入数据library(zoo) n <- nrow(DF) Any <- function(x) any(x[time(x) > end(x) - 20], na.rm = TRUE) gen_new <- function(ix) with(DF[ix, ], { z <- zoo(Dis1, DaysToEvent) rollapplyr(z, 20, Any, coredata = FALSE, partial = TRUE) }) DF <- transform(DF, new2 = ave(1:n, Patient, FUN = gen_new) == 1) # check that new and new2 are the same identical(DF$new, DF$new2) ## [1] TRUE为：

DF

Answer 2

我愿意接受更有效的建议，但这里只是使用dplyr的解决方案：

library(tidyr)
library(dplyr)

group_by(mydata,Patient) %>% 
    do(new = sapply(.$DaysToEvent,function(x)
        {
            any(.$Dis1*between(.$DaysToEvent,x-20,x))
        }
    ) %>% 
    unnest()

循环分组数据并执行

2 个答案: