计算具有相同值的几列的百分比

时间:2020-06-16 11:22:42

标签: r dplyr

我有一个数据集,其中四个变量具有相同的值。现在,我想计算每个变量中的值所占的百分比,以便将它们绘制在堆积的条形图中。

这是示例数据集:

   climate_change            air_quality              water_polution             trash                 
   <chr>                     <chr>                    <chr>                      <chr>                 
 1 Not a very serious probl~ A somewhat serious prob~ A somewhat serious problem A very serious problem
 2 Not a very serious probl~ Not a very serious prob~ Not a very serious problem Not a very serious pr~
 3 NA                        NA                       NA                         NA                    
 4 NA                        NA                       NA                         NA                    
 5 A very serious problem    A very serious problem   A very serious problem     A very serious problem
 6 A somewhat serious probl~ A very serious problem   Not at all a serious prob~ A somewhat serious pr~

我知道如何计算每个变量的百分比份额,例如:

lebanon %>%
  filter(!is.na(climate_change)) %>%
  count(climate_change) %>%
  mutate(prop = n / sum(n))

获取:

  climate_change                   n   prop
  <chr>                        <int>  <dbl>
1 A somewhat serious problem     348 0.286 
2 A very serious problem         620 0.510 
3 Not a very serious problem     202 0.166 
4 Not at all a serious problem    45 0.0370

我现在想要的是找到一种解决方案,以将值列为行,同时将变量列为具有n和/或prop值的列。最有效的方法是什么?

我想要这样的东西:

                             climate_change    air_quality   .....   .....
  <chr>                         <dbl>
1 A somewhat serious problem      0.286           .....
2 A very serious problem          0.510           .....
3 Not a very serious problem      0.166 
4 Not at all a serious problem   0.0370

我很难描述这个问题,也很难在这个站点上找到类似的问题。我希望我已经很好地描述了它,如果您知道一个类似的问题,请在此处链接。 :)

问候

2 个答案:

答案 0 :(得分:2)

您可以使用{tidyr}中的数据透视功能将您的解决方案应用于数据框的长格式版本,然后将其旋转回原始形状。

data <- tribble(~Q1, ~Q2, ~Q3,
                'ans1', 'ans1', 'ans1',
                'ans1', 'ans2', 'ans2',
                'ans2', 'ans2', 'ans2',
                'ans1', 'ans3', 'ans2',
                'ans3', 'ans1', NA,
                'ans3', 'ans3', 'ans1',
                 NA   , 'ans2', NA,)

data %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  count(value) %>% 
  drop_na() %>%                 # If you omit this line, NA values will be
                                # counted as a separate answer.
  mutate(prop = n / sum(n)) %>% 
  select(-n) %>% 
  pivot_wider(values_from = prop, values_fill = list(prop = 0)) 
  # If there is no proportion for a given Q/A combination, 
  # it is because the answer has not been given to this question.

# A tibble: 3 x 4
  value    Q1    Q2    Q3
  <chr> <dbl> <dbl> <dbl>
1 ans1  0.5   0.286   0.4
2 ans2  0.167 0.429   0.6
3 ans3  0.333 0.286   0  

答案 1 :(得分:0)

喜欢吗?

library(tidyverse)
df %>% 
  pivot_longer(1:4) %>% 
  filter(!is.na(value)) %>% 
  count(name, value) %>% 
  group_by(name) %>% 
  mutate(prop = n / sum(n)) %>% 
  select(-n) %>% 
  pivot_wider(names_from = name, values_from = prop)

# A tibble: 4 x 5
  value                     air_quality climate_change trash water_polution
  <chr>                           <dbl>          <dbl> <dbl>          <dbl>
1 A somewhat serious probl         0.25           0.25  0.25           0.25
2 A very serious problem           0.5            0.25  0.5            0.25
3 Not a very serious probl         0.25           0.5   0.25           0.25
4 Not at all a serious prob       NA             NA    NA              0.25

数据

df <- tibble::tribble(
                   ~climate_change,              ~air_quality,              ~water_polution,                   ~trash,
        "Not a very serious probl", "A somewhat serious probl", "A somewhat serious probl", "A very serious problem",
        "Not a very serious probl", "Not a very serious probl", "Not a very serious probl",  "Not a very serious probl",
                                NA,                        NA,                           NA,                       NA,
                                NA,                        NA,                           NA,                       NA,
          "A very serious problem",  "A very serious problem",     "A very serious problem", "A very serious problem",
        "A somewhat serious probl",  "A very serious problem",  "Not at all a serious prob",  "A somewhat serious probl"
        )