Question

我想聚合som数据，结合两个数据集lsr和依从性（再现性的例子）：

adherence <- cbind.data.frame(c("1", "2", "3", "1", "2", "3"), c("2013.1", "2013.1", "2013.1", "2013.2", "2013.2", "2013.2"))
library(dplyr)
library(tidyr)
names(adherence)[1] <- "ID" 
names(adherence)[2] <- "year"
adherence$year <- as.numeric(as.character(adherence$year))

lsr <- cbind.data.frame(
        c("1", "1", "1", "2", "2", "2", "3", "3"), #ID
        c("2012.3", "2012.8", "2013.1","2012.8", "2013.3", "2013.9", "2011", "2013"), #eksd
        c("60", "90", "90", "60", "120", "60", "30", "90") # DDD
        )
names(lsr)[1] <- "ID"
names(lsr)[2] <- "eksd"
names(lsr)[3] <- "DDD"

lsr$eksd <- as.numeric(as.character(lsr$eksd))
lsr$DDD <- as.numeric(as.character(lsr$DDD))
lsr$ENDDATE <- lsr$eksd + lsr$DDD/365.25

因此，在依从性数据集中，我想要一个新列，其中包含来自lsr数据集的信息，具体取决于依从性数据集中每行的ID变量和年变量。我做了这个功能，我认为这样做：

function.DDAV <- function() {
        Y <- lsr %>% #dummy variable
          filter(., .$ID == adherence$ID) %>% #filters lsr by ID from relevant row in adherence dataset
          filter(., .$eksd <= adherence$year & adherence$year <= .$ENDDATE) # filters further to include relevant time period
          Y$DIFF <- (Y$ENDDATE - adherence$year)*365.25 # calculates relevant drug dosage available for implementation of dosis regimen at time = adhererence$year i.e. uses the time from adherence dataset and compares with time until dose runs out
          z <- sum(Y$DIFF) #sum available drugs doses
          return(z) # returns sum to be inputted in one row in new column in adherence dataset
      }

我希望对依从性数据集中的每一行应用一次，使用相关的行值创建一个新列。

我尝试使用apply系列函数和聚合函数。我已经挣扎了几天，我得到了不同的错误，我还没有表现出来，因为我认为我可能会考虑使用这个问题都错了？搜索时，我只能找到有关在一个数据集中应用整个行或列的问题。

编辑我在下面显示了理想的输出：理想的输出如下：

遵守$ adherence＆lt; - as.numeric（c（＆＃34; 90＆＃34;，＆＃34; 0＆＃34;，＆＃34; 53.475＆＃34;，＆＃34; 16.95＆＃ 34;，＆＃34; 120＆＃34;＆＃34; 0＆＃34））

在本质上，过滤后计算很简单：如果在过滤后包含一行，则从第一行开始计算这个值，并遵守$ ID = 1和遵守$ year = 2013.1。只剩下ID = 1的最后一行，总和是（2013.1（遵守$ year）-2013.346（lsr $ ENDDATE））* 365.25 = 90.

Answer 1

我并非100％确定这是你所追求的，但希望它是一个起点。

我的理解是，对于ID中acceptance的每次出现，您希望将其与lsr中的所有ID进行比较，以及year大于或等于eksd且小于ENDDATE您希望使用剩余的ENDDATE进行计算。

以下应该可以做到这一点，newVariable中的结果与理想输出中的结果略有不同，因为对于与ID = 2和年份= 2013.2相关的值，不存在eksd中ID = 2的lsr值小于或等于year因此，而不是120，我得到0。

 adherence <- cbind.data.frame(c("1", "2", "3", "1", "2", "3"), c("2013.1", "2013.1", "2013.1", "2013.2", "2013.2", "2013.2"))
library(dplyr)
library(tidyr)
names(adherence)[1] <- "ID" 
names(adherence)[2] <- "year"
adherence$year <- as.numeric(as.character(adherence$year))

lsr <- cbind.data.frame(
     c("1", "1", "1", "2", "2", "2", "3", "3"), #ID
     c("2012.3", "2012.8", "2013.1","2012.8", "2013.3", "2013.9", "2011", "2013"), #eksd
     c("60", "90", "90", "60", "120", "60", "30", "90") # DDD
)
names(lsr)[1] <- "ID"
names(lsr)[2] <- "eksd"
names(lsr)[3] <- "DDD"

lsr$eksd <- as.numeric(as.character(lsr$eksd))
lsr$DDD <- as.numeric(as.character(lsr$DDD))
lsr$ENDDATE <- lsr$eksd + lsr$DDD/365.25

adherence %>% full_join(lsr, by = 'ID')  %>% mutate(newVariable = ifelse(eksd <= year & year <= ENDDATE, (ENDDATE - year) * 365.25, 0)) %>% 
     group_by(ID, year) %>% summarize(newVariable = sum(newVariable)) 

      ID   year newVariable
  <fctr>  <dbl>       <dbl>
1      1 2013.1      90.000
2      1 2013.2      53.475
3      2 2013.1       0.000
4      2 2013.2       0.000
5      3 2013.1      53.475
6      3 2013.2      16.950

Answer 2

行。所以上面的解决方案让很多资源在大型数据集上运行，所以我最终得到了这个解决方案：

function.AH <- function(x) {
  A <- subset(lsr, ID == x[1] & eksd <= x[2] & ENDDATE > x[2]) 
  if (nrow(A) == 0) {
    0
  } else {
   sum(A$ENDDATE-as.numeric(x[2]))*365.25
  } 
}

apply (adherence, 1,  FUN = function.AH)

通过在每行上使用来自多个数据集的输入运行函数来创建新列

2 个答案: