我有处方记录的数据集。每行是特定日期单一药物的处方。我将药物分成两组,部分重叠。我想确定两个药物组在相隔3天内处方的处方,但不包括从第1组和第2组发出相同药物的地方,以确定该药物的后一种药物的日期。
我的数据示例:
library(data.table)
set.seed(10)
DT <- data.table(day = sample(c(1:31), 30, replace = TRUE),
drug_group = sample(c(1, 2), 30, replace = TRUE),
drug_1 = sample(c("A", "B", "C"), 30, replace = TRUE),
drug_2 = sample(c("A", "D", "E"), 30, replace = TRUE))
DT[drug_group == 1, drug := drug_1]
DT[drug_group == 2, drug := drug_2]
DT[, c("drug_1", "drug_2") := NULL]
setkey(DT, day)
以下内容:
day drug_group drug
1: 2 1 B
2: 3 1 C
3: 4 1 B
4: 7 2 E
5: 8 1 A
6: 9 2 A
7: 9 2 D
8: 9 1 C
9: 10 1 A
10: 12 1 A
...
24: 22 2 D
25: 22 2 E
26: 24 1 A
27: 25 1 A
28: 26 2 D
29: 26 1 C
30: 27 1 C
我想得到这样的结果:
day interaction_present
1: 1 FALSE
2: 2 FALSE
3: 3 FALSE
4: 4 FALSE
5: 5 FALSE
6: 6 FALSE
...
26: 26 TRUE
29: 29 FALSE
30: 30 FALSE
我很确定我可以通过循环遍历每一行来做到这一点,但我已经被反复告诫使用循环而不是矢量化,我想知道这种类型的任务是否可行而没有循环?我已经看过使用data.table shift()函数来设置滞后但我担心创建太多新列,因为我的实际data.table超过一百万行。
很抱歉,如果这是一个微不足道的问题,或者之前有人问过,但是我整个下午都被困在了这一天,我就放弃了这一天!
答案 0 :(得分:-1)
如果我正确理解您的问题,以下内容应该有效。可以轻松地对函数内的逻辑决策以及时间滞后start
和end
变量进行任何更改。
timelagadj <- function(i){
## this should be changed depending on what you mean by "within"
## 3 days. This currently goes i-3,i+3
## but if i-3 or i+3 doesn't exist in dayDT$day
## then we pull the value one above/below i-3/i+3 respectively
start <- max(dayDT$day[i]-3,dayDT$day[1])
start <- ifelse(dayDT$day[findInterval(start,dayDT$day)] < start,
findInterval(start,dayDT$day)+1,
findInterval(start,dayDT$day))
end <- min(dayDT$day[i]+3,dayDT$day[nrow(dayDT)])
end <- findInterval(end,dayDT$day)
## now we pull the relevant group ID and drug ID
gIDs <- dayDT$groupID[start:end]
dIDs <- dayDT$drugID[start:end]
## here we unlist the paste made before
## to group by days
gIDs <- unlist(strsplit(gIDs,"_"))
dIDs <- unlist(strsplit(dIDs,"_"))
## now we can apply our logic rule based
## on the criteria you mentioned
if(length(unique(gIDs))>1){
tmp <- unique(data.frame(gIDs,dIDs))
if(length(unique(tmp$gIDs))!=length(unique(tmp$dIDs))) T else F
}else F
}
dayDT <- DT[,list("drugID"=paste(drug,collapse="_"),"groupID"=paste(drug_group,collapse="_")),by=day]
res <- sapply(1:nrow(dayDT),function(m) timelagadj(m))
res <- dayDT[,list(day,"interaction_present"=res)]