我有一个针对不同患者(record_id)随时间(天)重复测量(hb)的数据集。我想为每位患者找到hb的最低值,然后使用它创建一个分类变量,将患者分为“低nadirhb”(< 70),“mid nadirhb”(70-90)和“high nadirhb” “(> 90)。我非常感谢你的帮助,因为我完全陷入困境......
record_id Day hb
1 0 122
1 1 90
1 2 71
1 3 71
2 0 139
2 1 130
2 2 119
2 3 106
3 0 89
3 1 126
3 2 127
3 3 110
4 0 90
4 1 86
4 2 82
4 3 78
5 0 118
5 1 108
5 2 95
5 3 94
我已尝试过以下代码,但我无法合并df和x1:
x1 <- aggregate(hb~record_id, data=df, FUN=function(df) c(min=min(df), count=length(df))) #this successfully finds the min hb for each patient
x1<- rename(x1, c("hb" = "nadirhb"))
x1 <- as.data.frame(x1)
m=merge(df,x1,by="record_id")
summary(df$nadirhb)
#create hb categorical variable
df$hbcat[df$nadirhb >=90] <- 2
df$hbcat[df$nadirhb >=70 & df$hb <90] <- 1
df$hbcat[df$nadirhb <70] <- 0
table(df$hbcat)
答案 0 :(得分:1)
使用dplyr
使这更直观。
library(dplyr)
# get min value for each record
df <- df %>% group_by(record_id) %>% mutate(min_hb = min(hb))
# create categorical variable dividing patients into segments
df <- df %>% mutate(hb_segment = ifelse(min_hb < 70, "low",
ifelse(min_hb < 90, "middle", "high")))
然后选择列并过滤为每位患者的单行
# filter to single row per patient
df_patient <- df %>%
select(record_id, min_hb, hb_segment) %>%
distinct()
结果
record_id min_hb hb_segment
(int) (int) (chr)
1 1 71 middle
2 2 106 high
3 3 89 middle
4 4 78 middle
5 5 94 high
编辑:正如Steven Beaupre在评论中指出的那样,你也可以这样做:
df %>% group_by(record_id) %>%
summarise(min_hb = min(hb)) %>%
mutate(hb_segment = ifelse(min_hb < 70, "low", ifelse(min_hb < 90, "middle", "high")))
有点短