根据每位患者的最小值创建新的分类变量

时间:2016-04-15 13:50:29

标签: r

我有一个针对不同患者(record_id)随时间(天)重复测量(hb)的数据集。我想为每位患者找到hb的最低值,然后使用它创建一个分类变量,将患者分为“低nadirhb”(< 70),“mid nadirhb”(70-90)和“high nadirhb” “(> 90)。我非常感谢你的帮助,因为我完全陷入困境......

record_id   Day hb  
1   0   122  
1   1   90  
1   2   71  
1   3   71    
2   0   139  
2   1   130  
2   2   119  
2   3   106  
3   0   89  
3   1   126  
3   2   127  
3   3   110  
4   0   90  
4   1   86  
4   2   82  
4   3   78  
5   0   118  
5   1   108  
5   2   95  
5   3   94  

我已尝试过以下代码,但我无法合并df和x1:

x1 <- aggregate(hb~record_id, data=df, FUN=function(df) c(min=min(df), count=length(df)))   #this successfully finds the min hb for each patient  
x1<- rename(x1, c("hb" = "nadirhb"))  
x1 <- as.data.frame(x1)  
m=merge(df,x1,by="record_id")  
summary(df$nadirhb)  
#create hb categorical variable  
df$hbcat[df$nadirhb >=90] <- 2  
df$hbcat[df$nadirhb >=70 & df$hb <90] <- 1  
df$hbcat[df$nadirhb <70] <- 0  
table(df$hbcat) 

1 个答案:

答案 0 :(得分:1)

使用dplyr使这更直观。

library(dplyr)

# get min value for each record 
df <- df %>%   group_by(record_id) %>%   mutate(min_hb = min(hb))

# create categorical variable dividing patients into segments 
df <- df %>%   mutate(hb_segment = ifelse(min_hb < 70, "low", 
                             ifelse(min_hb < 90, "middle", "high")))

然后选择列并过滤为每位患者的单行

# filter to single row per patient
df_patient <- df %>%
    select(record_id, min_hb, hb_segment) %>%
    distinct()

结果

  record_id min_hb hb_segment
      (int)  (int)      (chr)
1         1     71     middle
2         2    106       high
3         3     89     middle
4         4     78     middle
5         5     94       high
编辑:正如Steven Beaupre在评论中指出的那样,你也可以这样做:

df %>% group_by(record_id) %>% 
    summarise(min_hb = min(hb)) %>% 
    mutate(hb_segment = ifelse(min_hb < 70, "low", ifelse(min_hb < 90, "middle", "high")))

有点短