数据帧中基于分类变量的均值和标准差函数

时间:2020-03-24 16:24:22

标签: r dplyr aggregate

我有30位患者,他们有100项临床数据,例如体重,BMI,腰围等,我想根据他们的疾病状况为所有患者获取均值和SD。例如,我的数据集如下

Patient_id   DateOfBirth       Sex     Weight1   Bmi1   Wasit1  Disease
204065       25-06-1995       Female    113.8    41.3   105.8   0
200214       09-12-1990       Female      90     35.6   108     1
191633       14-09-1971         Male    128.4    47     150     1
186156       22-09-1967         Male    157.3    51.4   145.6   0

我想根据他们的疾病状况输出信息

Disease weight1Mean  Weight1SD      BMI1Mean    BMI1SD     Waist1Mean  WaistSD  
  0        135           30.7         46.3       7.14       125.7       28.1
  1        109           27.1         41.3       8.06       129         29.7

2 个答案:

答案 0 :(得分:0)

your_df %>%
groupy_by(Disease) %>%
summarize(Weight1Mean = mean(Weight1),
Weight1SD = sd(Weight1
#Repeat for the rest of variables to sumamrize
)

您也可以使用summarize_at代替summarize

#... %>%
summarize_at(vars(Weight1, BMI1, Waist1), list(Mean = mean, SD = sd))

summarize_if

#... %>%
summarize_if(is.numeric, list(Mean = mean, SD = sd))

如果要从汇总中排除数字变量,则可以将它们重新编码为因子,或使用select删除它们。

答案 1 :(得分:0)

我们可以使用data.table

 library(data.table)
 setDT(df1)[, .(Weight1Mean = mean(Weight1), Weight1SD = sd(Weight1)), Disease]