如何在R中创建汇总的人口统计表

时间:2020-10-07 15:07:32

标签: r contingency

我从阿尔茨海默氏病患者队列中获得了这些数据。我想创建一个汇总表(或列联表)以显示该表中的所有信息。这就是我希望在这个队列中看到的:男性和女性多少,平均发病年龄,上次访视的平均年龄,死亡的平均年龄,载脂蛋白4any的样本数(IID)。在R中创建此类表格的方法应该是什么?

dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875", 
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592", 
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"), 
    cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
    ), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L, 
    `2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L, 
    `8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"), 
    status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
    ), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L, 
    `2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L, 
    `8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"), 
    Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
    ), .Label = "0", class = "factor"), age_onset = structure(c(NA, 
    NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67", 
    " 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L, 
    `2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L, 
    `8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74", 
    "77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA, 
    NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88", 
    " 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L, 
    `2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L, 
    `8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

R将factor类用于分类数据。如果您将年龄(当前是因素)更改为numeric,则summary(dat)将为您提供大部分所需的信息。

convert_to_numeric = c("age_onset", "age_last_visit", "age_death")
dat[convert_to_numeric] = lapply(dat[convert_to_numeric], function(x) as.numeric(as.character(x)))
summary(dat)
 #         IID        cohort   sex   status Race   Ethnicity   age_onset  age_last_visit 
 # NACC000875:1   ADC8_AA:10   1:2   1:6    2:10   0:10      Min.   :63   Min.   :70.00  
 # NACC003779:1                2:8   2:4                     1st Qu.:66   1st Qu.:70.25  
 # NACC006805:1                                              Median :69   Median :75.50  
 # NACC008215:1                                              Mean   :70   Mean   :76.70  
 # NACC010067:1                                              3rd Qu.:73   3rd Qu.:81.00  
 # NACC010592:1                                              Max.   :79   Max.   :89.00  
 # (Other)   :4                                              NA's   :6                   
 #   age_death     apoe4any
 # Min.   :72.00   0:3     
 # 1st Qu.:80.00   1:7     
 # Median :88.00           
 # Mean   :83.33           
 # 3rd Qu.:89.00           
 # Max.   :90.00           
 # NA's   :7            

请参阅this common FAQ,了解我向数字转换的因素。

如果您只想汇总提到的列,则还可以对数据进行子集处理:

summary(dat[c("sex", convert_to_numeric, "apoe4any")])
 # sex     age_onset  age_last_visit    age_death     apoe4any
 # 1:2   Min.   :63   Min.   :70.00   Min.   :72.00   0:3     
 # 2:8   1st Qu.:66   1st Qu.:70.25   1st Qu.:80.00   1:7     
 #       Median :69   Median :75.50   Median :88.00           
 #       Mean   :70   Mean   :76.70   Mean   :83.33           
 #       3rd Qu.:73   3rd Qu.:81.00   3rd Qu.:89.00           
 #       Max.   :79   Max.   :89.00   Max.   :90.00           
 #       NA's   :6                    NA's   :7