我正在处理一个数据集,其中每个参与者(ID)均被评估1、2或3次。这是一个纵向研究。不幸的是,当第一位分析师对数据集进行编码时,她/他没有分配有关该数据集的任何信息。
由于所有参与者都有年龄信息(以月为单位),因此很容易确定何时进行第一次评估,何时进行第二次评估,依此类推。在第一次评估中,参与者比第二次评估年轻,依此类推。
我使用了tidyverse工具来解决这个问题,并且一切正常。但是,我真的知道(想象...)还有许多其他(更多)优雅的解决方案,我来到这个论坛是为了解决这个问题。有人可以让我思考如何使这段代码更短,更清晰吗?
这是复制代码的伪造数据:
ds <- data.frame(id = seq(1:6),
months = round(rnorm(18, mean=12, sd=2),0),
x1 = sample(0:2),
x2 = sample(0:2),
x3 = sample(0:2),
x4 = sample(0:2))
#add how many times each child was acessed
ds <- ds %>% group_by(id) %>% mutate(how_many = n())
#Add position
ds %>% group_by(id) %>%
mutate(first = min(months),
max = max(months),
med = median(months)) -> ds
#add label to the third evaluation (the second will be missing)
ds %>%
mutate(group = case_when((how_many == 3) & (months %in% first) ~ "First evaluation",
(how_many == 3) & (months %in% max) ~ "Third evaluation",
TRUE ~ group)) -> ds
#add label to the second evaluation for all children evaluated two times
ds %>% mutate_at(vars(group), funs(if_else(is.na(.),"Second Evaluation",.))) -> ds
这是我的原始代码:
temp <- dataset %>% select(idind, arm, infant_sex,infant_age_months)
#add how many times each child was acessed
temp <- temp %>% group_by(idind) %>% mutate(how_many = n())
#Add position
temp %>% group_by(idind) %>%
mutate(first = min(infant_age_months),
max = max(infant_age_months),
med = median(infant_age_months)) -> temp
#add label to the first evaluation
temp %>%
mutate(group = case_when(how_many == 1 ~ "First evaluation")) -> temp
#add label to the second evaluation (and keep all previous results)
temp %>%
mutate(group = case_when((how_many == 2) & (infant_age_months %in% first) ~ "First evaluation",
(how_many == 2) & (infant_age_months %in% max) ~ "Second evaluation",
TRUE ~ group)) -> temp
#add label to the third evaluation (the second will be missing)
temp %>%
mutate(group = case_when((how_many == 3) & (infant_age_months %in% first) ~ "First evaluation",
(how_many == 3) & (infant_age_months %in% max) ~ "Third evaluation",
TRUE ~ group)) -> temp
#add label to the second evaluation for all children evaluated two times
temp %>% mutate_at(vars(group), funs(if_else(is.na(.),"Second Evaluation",.))) -> temp
请记住,在询问之前我使用了搜索框,并且我真的想象其他人在编写程序时会想到相同的问题。 非常感谢
答案 0 :(得分:3)
你去了。我用rank()
给出了处理顺序。
ds <- data.frame(id = seq(1:6),
months = round(rnorm(18, mean=12, sd=2),0),
x1 = sample(0:2),
x2 = sample(0:2),
x3 = sample(0:2),
x4 = sample(0:2))
ds2 = ds %>% group_by(id) %>% mutate(rank = rank(months,ties.method="first"))
labels = c("First", "Second","Third")
ds2$labels = labels[ds2$rank]
答案 1 :(得分:1)
或者只是按年龄排列并使用1:n()
而不是n()
,它会创建一个序列:
ds <- ds %>% group_by(id) %>% arrange(months) %>% mutate(how_many = 1:n())
ds %>% arrange(id, months)
# A tibble: 18 x 7
# Groups: id [6]
id months x1 x2 x3 x4 how_many
<int> <dbl> <int> <int> <int> <int> <int>
1 1 10 1 2 0 1 1
2 1 11 1 2 0 1 2
3 1 12 1 2 0 1 3
4 2 11 0 1 2 2 1
5 2 14 0 1 2 2 2
6 2 14 0 1 2 2 3
然后,您可以根据需要使用factor
附加标签。
ds$label <- factor(ds$how_many, level = 1:3, label = c("First", "Second","Third"))
head(ds)
# A tibble: 18 x 8
# Groups: id [6]
id months x1 x2 x3 x4 how_many label
<int> <dbl> <int> <int> <int> <int> <int> <fct>
1 1 10 1 2 0 1 1 First
2 1 11 1 2 0 1 2 Second
3 1 12 1 2 0 1 3 Third
4 2 11 0 1 2 2 1 First
5 2 14 0 1 2 2 2 Second
6 2 14 0 1 2 2 3 Third