在前瞻性研究中,您希望总结您的样本的年龄,观察的年份,以及完全观察它们的时间。这些共同考虑样本的年龄,周期和群组时间尺度。
最简单的说明方式是使用模拟数据:
假设这些数据总结了一组临床患者的基线年龄以及开始和停止观察日期:
set.seed(123)
n <- 10000
Obs <- data.frame(
'age' = sample(seq(40, 80, by=5), n, replace=T),
'start' = as.Date(n0 <- runif(n, 10000, 12000), origin="1970-01-01"),
'end' = as.Date(n0 + runif(n, 0, 3652.5), origin="1970-01-01")
)
我希望foo
采用向量
AgeCut <- c(0, 65, Inf)
Yrcut <- c(0, 2000, Inf)
DurCut <- c(0, 5, Inf)
并交叉列出至少一天内属于这些值的每种可能排列的个体数量。或者,更复杂的是,一个人属于一个类别的年数。例如,当他们在1990年进入样本并且留在30年时为40岁的人,当他们进入yt65 / bf2000 /gt5年并且在那里待了5年时,将在yt65 / bf2000 / lt5year类别中持续5年。他们进入yt65 / af2000 / gt5year再过15年,最后ot65 / af2000 / gt5year
出于某种原因,这对我的大脑影响很大,我无法计算实际所需的输出,即使是通过一些低效的for循环,但格式和结构将类似于:
AgeCut YrCut DurCut NumObs
1 younger than 65 before 2000 less than 5 years 1000
2 65 and older before 2000 less than 5 years 1000
3 younger than 65 2000 and later less than 5 years 1000
4 65 and older 2000 and later less than 5 years 1000
5 younger than 65 before 2000 5 or more years 1000
6 65 and older before 2000 5 or more years 1000
7 younger than 65 2000 and later 5 or more years 1000
8 65 and older 2000 and later 5 or more years 1000
答案 0 :(得分:1)
使用一些tidyverse函数,我想你想要这样的东西
library(tidyverse)
AgeCut <- c(0, 65, Inf)
Yrcut <- c(0, 2000, Inf)
DurCut <- c(0, 5, Inf)
Obs %>% transmute (
ageCat = cut(age, AgeCut, c("younger than 65 ","65 and older"), right=FALSE),
startCat = cut(year(start), Yrcut, c("before 2000", "2000 and later"), right=FALSE),
DurCut = cut(year(end)-year(start), DurCut, c("less than 5 years", "5 or more years"), right=FALSE)
) %>% table() %>% as_data_frame()
返回
ageCat startCat DurCut n
<chr> <chr> <chr> <int>
1 younger than 65 before 2000 less than 5 years 1196
2 65 and older before 2000 less than 5 years 968
3 younger than 65 2000 and later less than 5 years 1312
4 65 and older 2000 and later less than 5 years 1015
5 younger than 65 before 2000 5 or more years 1503
6 65 and older before 2000 5 or more years 1185
7 younger than 65 2000 and later 5 or more years 1580
8 65 and older 2000 and later 5 or more years 1241
cut()
函数正在完成大部分工作。
答案 1 :(得分:0)
好的我在基础R中有这个实现。它递归地计算在当前类别中花费的时间,直到移动到下一个,将持续时间添加到各个计数器并从参与的整个持续时间中减去它,然后提供将更新的时间和持续时间更新为apc
函数。
apc <- function(times, cuts, dur, strata=1) {
class <- mapply(findInterval, times, cuts)
tnext <- mapply( ## times until next category
function(t, c, i) {c[i+1] - t},
times, cuts, as.data.frame(class)
)
mnext <- apply(tnext, 1, min, na.rm=T) ## minimum time to next category
mnext <- pmin(mnext, dur) ## truncate if duration exceeded before next
dur <- dur-mnext
times <- lapply(times, `+`, mnext)
if (all(dur == 0))
return(list(data.frame(class, 't'=mnext, strata)))
return(c(list(data.frame(class, 't'=mnext, strata)), apc(times, cuts, dur, strata=strata)))
}
这估计每个类别中的以下人数年份为:
> val
age start cohort strata t
1 1 1 1 1 3175.986
2 2 1 1 1 2582.793
3 1 2 1 1 17714.503
4 2 2 1 1 13972.134
5 1 2 2 1 5658.430
6 2 2 2 1 6957.702
其中总和(50,061.55)等于Obs$end-Obs$start
的总和。