我有处方记录数据,想知道每个人从发布日期到记录结束每年有多少处方。示例数据(每个ID的前5行):
ID Issue_Date index.date other.drugs
1: 1 2000-02-08 2011-02-03 1
2: 1 2000-04-04 2011-02-03 0
3: 1 2000-05-30 2011-02-03 1
4: 1 2000-07-25 2011-02-03 1
5: 1 2000-08-22 2011-02-03 1
---
1: 2 2007-03-23 2009-04-03 1
2: 2 2007-04-04 2009-04-03 1
3: 2 2007-04-23 2009-04-03 1
4: 2 2007-04-23 2009-04-03 0
5: 2 2007-05-21 2009-04-03 1
other.drugs
列是一个指示变量,显示该日期给出的处方是否不是研究中感兴趣的处方。 index.date
是他们进入研究的日期。有超过1000 ID
个,此处只有2个。
我希望在other.drugs
之后找到每年issue.date
年的总和。我使用以下代码分别计算了第一年:
dt <- dt[, yearend.1 := Issue_Date[1]+365, by = ID]
dt <- dt[(Issue_Date<=yearend.1), comorbid.1 := sum(other.drugs), by = ID]
dt <- dt[, comorbid.1:= comorbid.1[!is.na(comorbid.1)][1], by = ID]
# the last line copies the value to each cell the ID occupies in the data.table for that column instead of having NA's
这给出了以下结果:
ID Issue_Date index.date other.drugs yearend.1 comorbid.1
1: 1 2000-02-08 2011-02-03 1 2001-02-07 8
2: 1 2000-04-04 2011-02-03 1 2001-02-07 8
3: 1 2000-05-30 2011-02-03 1 2001-02-07 8
4: 1 2000-07-25 2011-02-03 1 2001-02-07 8
5: 1 2000-08-22 2011-02-03 1 2001-02-07 8
---
1: 2 2007-03-23 2009-04-03 1 2008-03-22 30
2: 2 2007-04-04 2009-04-03 1 2008-03-22 30
3: 2 2007-04-23 2009-04-03 1 2008-03-22 30
4: 2 2007-04-23 2009-04-03 1 2008-03-22 30
5: 2 2007-05-21 2009-04-03 1 2008-03-22 30
解释:身份证1在第一次issue_date
之后的一年内开出了8种其他药物,并且ID 2被规定为30。
对于2 - 10年(最多有11年的记录),我写了以下循环:
years <- seq(730, 3650, 365)
# number of days in 2-10 years.
years2 <- seq(2,10,1)
# numbering the years for column names
colnames <- paste0("yearend.", years2)
colnames2 <- paste0("comorbid.", years2)
# names of columns to be used
for (i in 1:length(years)) {
dt <- dt[, colnames[i] := Issue_Date[1]+years[i], by = ID]
dt <- dt[(Issue_Date>=(as.Date(colnames[i], "%d-%m-%Y")) & Issue_Date<(as.Date(colnames[i+1], "%d-%m-%Y"))),
colnames2[i] := sum(other.drugs), by = ID]
dt <- dt[, colnames2[i]:= colnames2[i][!is.na(colnames2[i])][1], by = ID]
}
但是应该创建的新列是:
ID Issue_Date index.date other.drugs yearend.1 comorbid.1 yearend.2 comorbid.2 yearend.3 comorbid.3
1: 1 2000-02-08 2011-02-03 1 2001-02-07 8 2002-02-07 comorbid.2 2003-02-07 comorbid.3
2: 1 2000-04-04 2011-02-03 1 2001-02-07 8 2002-02-07 comorbid.2 2003-02-07 comorbid.3
3: 1 2000-05-30 2011-02-03 1 2001-02-07 8 2002-02-07 comorbid.2 2003-02-07 comorbid.3
4: 1 2000-07-25 2011-02-03 1 2001-02-07 8 2002-02-07 comorbid.2 2003-02-07 comorbid.3
5: 1 2000-08-22 2011-02-03 1 2001-02-07 8 2002-02-07 comorbid.2 2003-02-07 comorbid.3
---
我想知道我的循环出了什么问题。非常感谢帮助。
答案 0 :(得分:1)
每当您需要在data.table
中使用实际来自R中的变量的列名时,您需要使用get
。因此你应该像这样重写你的循环,
for (i in 1:length(years)) {
dt <- dt[, colnames[i] := Issue_Date[1]+years[i], by = ID]
dt <- dt[(Issue_Date>=(as.Date(get(colnames[i]), "%d-%m-%Y")) & Issue_Date<(as.Date(get(colnames[i+1]), "%d-%m-%Y"))),
colnames2[i] := sum(other.drugs), by = ID]
dt <- dt[, colnames2[i]:= get(colnames2[i])[!is.na(get(colnames2[i]))][1], by = ID]
}
我实际上无法测试你的代码,因为我遇到了两个问题:
Issue_Date>...
colnames[i+1]
,即yearend.X
才真正创建它(也许你已经运行了好几次,这就是为什么你不喜欢它没有错误?)我做了类似的事情来测试它,当然comorbid.2
的值没有意义:
dt
ID Issue_Date index.date other.drugs yearend.1 comorbid.1
1: 1 00-02-08 2011-02-03 1 01-02-07 4
2: 1 00-04-04 2011-02-03 0 01-02-07 4
3: 1 00-05-30 2011-02-03 1 01-02-07 4
4: 1 00-07-25 2011-02-03 1 01-02-07 4
5: 1 00-08-22 2011-02-03 1 01-02-07 4
6: 2 07-03-23 2009-04-03 1 08-03-22 4
7: 2 07-04-04 2009-04-03 1 08-03-22 4
8: 2 07-04-23 2009-04-03 1 08-03-22 4
9: 2 07-04-23 2009-04-03 0 08-03-22 4
10: 2 07-05-21 2009-04-03 1 08-03-22 4
i <- 1
dt <- dt[, colnames[i] := Issue_Date[1]+years[i], by = ID]
dt <- dt[Issue_Date<get(colnames[i]),
colnames2[i] := sum(other.drugs), by = ID]
dt <- dt[, colnames2[i]:= get(colnames2[i])[!is.na(get(colnames2[i]))][1], by = ID]
dt
ID Issue_Date index.date other.drugs yearend.1 comorbid.1 yearend.2 comorbid.2
1: 1 00-02-08 2011-02-03 1 01-02-07 4 02-02-07 4
2: 1 00-04-04 2011-02-03 0 01-02-07 4 02-02-07 4
3: 1 00-05-30 2011-02-03 1 01-02-07 4 02-02-07 4
4: 1 00-07-25 2011-02-03 1 01-02-07 4 02-02-07 4
5: 1 00-08-22 2011-02-03 1 01-02-07 4 02-02-07 4
6: 2 07-03-23 2009-04-03 1 08-03-22 4 09-03-22 4
7: 2 07-04-04 2009-04-03 1 08-03-22 4 09-03-22 4
8: 2 07-04-23 2009-04-03 1 08-03-22 4 09-03-22 4
9: 2 07-04-23 2009-04-03 0 08-03-22 4 09-03-22 4
10: 2 07-05-21 2009-04-03 1 08-03-22 4 09-03-22 4
希望它有所帮助。