如何在R中的两个类别变量中获取百分比

时间:2020-03-27 13:20:01

标签: r

假设您的样本总数为8。 数据框看起来像这样。所有健康分数小于3的个人都是健康的,健康分数大于3的所有人都是病的。状态显示他们的就业状态。

Status<-(Employed,Unemployed,Student,Student,Employed,Unemployed,Unemployed,Housewife)
Health<-(Healthy,Healthy,Healthy,Sick,Sick,Control,Sick,Sick)

df<-(Status,Health)
level(Health)<-("Healthy,"Sick",Control)
level(Status)<-("Employed","Unemployed","Student","Housewife")

我想查看“健康”,“生病”或“控制”人群属于每个职业类别的百分比。我想要像下面的输出。 (例如,ps值仅是假设值),例如在所有员工中,有多少百分比健康?

                    Healthy      Sick   Control
Employed              10%        2%     1%
Unemployed             5%        1%     1%
Student                6%        3%     1%
Housewife              2%        5%     6%

我正在使用以下代码。但这只是给我频率,而不是百分比。我需要百分比。

tab <- with(df, table(df$Health,df$Status))

1 个答案:

答案 0 :(得分:1)

我们可以count StatusHealthgroup_by状态的人数,并计算百分比。为了获得更好的可见性,我们将数据转换为宽格式。

library(dplyr)

df %>%
  count(Status, Health) %>%
  group_by(Status) %>%
  mutate(n = n/sum(n) * 100) %>%
  tidyr::pivot_wider(names_from = Health, values_from = n, 
                     values_fill = list(n = 0))


# Status     Healthy  Sick Control
#  <fct>        <dbl> <dbl>   <dbl>
#1 Employed      50    50       0  
#2 Housewife      0   100       0  
#3 Student       50    50       0  
#4 Unemployed    33.3  33.3    33.3

在基数R中,我们可以将prop.tabletable一起使用以获取百分比。

prop.table(table(df), 1) * 100

数据

df <- structure(list(Status = structure(c(1L, 4L, 3L, 3L, 1L, 4L, 4L, 
2L), .Label = c("Employed", "Housewife", "Student", "Unemployed"
), class = "factor"), Health = structure(c(2L, 2L, 2L, 3L, 3L, 
1L, 3L, 3L), .Label = c("Control", "Healthy", "Sick"), 
class = "factor")), class = "data.frame",row.names = c(NA, -8L))