Question

我刚刚开始学习R，这非常有用，我试图用它来计算所涵盖的天数比例。该指标与衡量一个人对药物的依从性有关。基本上，在给定的时间段内，您会发现药物的所有填充物都会记录填充日期和供应中的天数，以确定它们所涵盖的天数。例如。如果一个人在2016年2月1日获得35天的填写，他们的报道范围为2016年1月1日至2016年3月6日。很容易。

当他们在第一次填充时用完填充物之前回去填充时会变得棘手，你不会计算天数（例如，这个人在2016年3月1日获得第二次填充，3 / 1-3 / 6只计算一次。）

我实际上已经编写了一些似乎正常工作的代码，但它使用了FOR循环，我已经学会了在R中工作得很好而我很担心当我开始向它投掷大量数据时。

以下是构建测试数据并初始化一些变量的代码的第一部分：

#Create test data vectors

  Person <- c(rep("Person1",12),rep("Person2",9))
  FillDate <- c("2016-1-1", "2016-2-1", "2016-3-1", "2016-4-1", "2016-5-1", "2016-6-1", "2016-7-1", "2016-8-1", "2016-9-1", "2016-10-1",    "2016-11-1",    "2016-12-1",    "2016-2-1", "2016-3-1", "2016-4-20",    "2016-5-1", "2016-6-1", "2016-7-1", "2016-8-1", "2016-9-1", "2016-10-1")
  DaysSupply <- c(rep("35", 14),    "20",   "5",    "20",   rep("35",   4))

  #Build into data.frame
  PDCTestData <- cbind.data.frame(as.factor(Person),as.Date(FillDate,"%Y-%m-%d"),as.numeric(DaysSupply))
  colnames(PDCTestData) <- c("Person","FillDate","DaysSupply")

#Create start and end dates for overall period
StartDate <- as.Date("2016-01-01")
EndDate <- as.Date("2016-12-31")

#Initialize DaysCoveredList, a vector to hold the list of dates that a person has drug coverage
DaysCoveredList <- NULL

#Initialize DaysCoveredTable, a matrix to count the total number of unique days in the DaysCovered List, by person
DaysCoveredTable <- NULL

以及完成实际工作的第二部分：

#Begin looping through individuals
for(p in levels(PDCTestData$Person)){

  #Begin looping through drug fills
  for(DrugSpan in 1:nrow(PDCTestData[PDCTestData$Person == p,])){

    #Create a sequence of the dates covered by that fill, the sequence starts on the fill date and runs for the number of days in Days Supply, Builds a list of all days covered for that person
    DaysCoveredList <- c(DaysCoveredList,seq.Date(from = PDCTestData[PDCTestData$Person == p,][DrugSpan,]$FillDate, length.out = PDCTestData[PDCTestData$Person == p,][DrugSpan,]$DaysSupply, by = "day"))

  } #Exit drug fill loop

  #Counts the number of unique days covered from the DaysCovredList, with in the start and end of the overall period
  DaysCovered <- length(unique(DaysCoveredList[DaysCoveredList >= StartDate & DaysCoveredList <= EndDate]))

  #Adds the unique count from DaysCovered to the summary DaysCoveredTable
  DaysCoveredTable <- rbind(DaysCoveredTable,cbind(p,DaysCovered))

  #Clear DaysCovered and DaysCovredList
  DaysCovered <- NULL
  DaysCoveredList <- NULL
} #Exit the individual loop

感谢您提供的任何帮助。

感谢。

Answer 1

library(lubridate)
ptd <- PDCTestData # I get bored writing long variable names

ptd$EndDate <- ptd$FillDate + ptd$DaysSupply
ptd$DrugInterval <- interval(ptd$FillDate, ptd$EndDate)

all_days <- as.Date(StartDate:EndDate, origin = "1970-01-01")

lapply(unique(ptd$Person), function (y) sum(sapply(all_days, function (x) any(x %within% ptd$DrugInterval[ptd$Person==y]))))

不保证速度，但可能更容易阅读。

R中的PDC计算 - 删除循环

1 个答案: