我有一个包含各种连续变量和分类变量的数据集。但是,我没有要创建的后续时间变量。我的数据集目前势在必行,有8个年龄段的年龄和8个年龄段的记录的事件是SELECT GROUP_CONCAT(DISTINCT(firstname) SEPARATOR ' ') ,
GROUP_CONCAT(DISTINCT(lastname) SEPARATOR ' ')
FROM table
。痴呆症变量的二进制编码为1和0。如何将它们用于变量两个会生成事件发生时间的变量?
这是数据的样子:
dementia_1
我希望有一个变量来表示每个人得痴呆症的时间。
答案 0 :(得分:0)
我不知道这是否对您有帮助。似乎您没有为每个人记录每个波浪的年龄。话虽如此,但前提是您只有在诊断后才有年龄。
library(tidyverse)
# Generate Sample Data
dat <- tibble(id = 1:50,
age_1 = rnorm(50, 50, 2)) %>%
mutate(
age_2 = age_1 + 5,
age_3 = age_2 + 5,
age_4 = age_3 + 5,
age_5 = age_4 + 5
) %>%
add_column(dementia = rbinom(50, 1, .1))
# Now get data in long format to do the calculations
dat_2 <- dat %>%
gather(wave, age, contains("age"))
dat_2 %>%
group_by(id, dementia) %>%
filter(dementia==1) %>% # Diagnoses
filter(age == min(age)) %>%
rename(age_at_diagnosis = age)# Age first appeared
这将为您提供以下内容:
# A tibble: 5 x 4
# Groups: id, dementia [5]
id dementia wave age_at_diagnosis
<int> <int> <chr> <dbl>
1 7 1 age_1 52.3
2 13 1 age_1 50.6
3 24 1 age_1 50.8
4 34 1 age_1 52.5
5 35 1 age_1 50.3
从理论上讲,您可以使用此数据框,然后将其与已故时间或数据集中的最小年龄合并。
first_diagnosis <- dat_2 %>%
group_by(id, dementia) %>%
filter(dementia==1) %>% # Diagnoses
filter(age == min(age)) %>%
ungroup() %>%
rename(age_at_diagnosis = age)# Age first appeared
age_first_age <- dat_2 %>%
group_by(id, dementia) %>%
filter(age == min(age)) # Age first appeared
age_first_age %>%
left_join(first_diagnosis %>%
select(id, age_at_diagnosis), by = "id") %>%
mutate(time_to_event = age_at_diagnosis - age)
哪个会给你看起来像这样的东西
# A tibble: 50 x 6
# Groups: id, dementia [50]
id dementia wave age age_at_diagnosis time_to_event
<int> <int> <chr> <dbl> <dbl> <dbl>
1 1 0 age_1 45.8 NA NA
2 2 0 age_1 46.7 NA NA
3 3 0 age_1 49.0 NA NA
4 4 0 age_1 53.4 NA NA
5 5 0 age_1 47.4 NA NA
6 6 0 age_1 49.1 NA NA
7 7 1 age_1 52.3 52.3 0
8 8 0 age_1 49.6 NA NA
9 9 0 age_1 52.1 NA NA
10 10 0 age_1 54.4 NA NA