具有两个因素的多列百分比

时间:2020-07-18 15:25:05

标签: r tidyverse

我正在尝试得出属于两个不同类别的统计数据的相对比例。这是原始文件的示例。

A tibble: 8 x 5
  resp          euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper    
  <fct>               <int>       <int>       <int> <fct>    
1 Remain                316         290         313 Times    
2 Leave                 157         123         159 Times    
3 Will Not Vote           2           3           3 Times    
4 Don't Know             56          51          55 Times    
5 Remain                190         175         199 Telegraph
6 Leave                 339         282         334 Telegraph
7 Will Not Vote           4           3           4 Telegraph
8 Don't Know             70          62          69 Telegraph

这是两个不同因素的总和。我正在尝试将响应的计数转换为百分比,以便看起来像这样:


A tibble: 8 x 5
  resp          euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper    
1 Remain                52%         53%        .. Times    
2 Leave                 43%         42%         .. Times    
3 Will Not Vote          1%            2%       . Times    
4 Don't Know             4%            3%       . Times    
5 Remain                35%         35%         . Telegraph
6 Leave                 52%         52%         . Telegraph
7 Will Not Vote          2%           2%           . Telegraph
8 Don't Know             11%          11%          . Telegraph

(显然,这些数字是不正确的,但我希望它表明每个4 x 1部分的总和应为100%)。

数据帧已经具有与表类似的格式,因此有没有办法将prop.table方法应用于df?当我这样尝试时,它拒绝,因为df不是干净的数组。有办法解决吗?

for_stack <- combined_tallies %>%
               group_by(Paper, resp) %>%
                prop.table(margin=2)

Here is an rds copy of the dataframe if this helps!

[我可以在SO的其他地方找到的最佳答案毫无用处](Percentage of factor levels by group in R

2 个答案:

答案 0 :(得分:3)

我已经使用dput()重新创建了您的数据集,建议您使用它来提供可重现的数据,以获取StackOverflow上的答案。

votes <- structure(list(resp = c("Remain", "Leave", "Will Not Vote", "Don’t Know", 
"Remain", "Leave", "Will Not Vote", "Don’t Know"), ref1 = c(316, 
157, 2, 56, 190, 339, 4, 70), ref2 = c(290, 123, 3, 51, 175, 
282, 3, 62), ref3 = c(313, 159, 3, 55, 199, 334, 4, 69), paper = c("Times", 
"Times", "Times", "Times", "Telegraph", "Telegraph", "Telegraph", 
"Telegraph")), .Names = c("resp", "ref1", "ref2", "ref3", "paper"
), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))

另一种方法是在执行分析之前更改数据集的结构。您试图创建相对值,而不是跨整个列或行,而是针对子集。解决此问题的一种方法是使用tidyverse软件包并以该格式执行分析。计算百分比后,您始终可以恢复到原始结构。

library(tidyverse)
vote_long <- votes %>% 
  pivot_longer(cols = c(ref1, ref2, ref3), names_to = "ref", values_to = "votes")


vote_long

# A tibble: 24 x 4
   resp          paper ref   votes
   <chr>         <chr> <chr> <dbl>
 1 Remain        Times ref1    316
 2 Remain        Times ref2    290
 3 Remain        Times ref3    313
 4 Leave         Times ref1    157
 5 Leave         Times ref2    123
 6 Leave         Times ref3    159
 7 Will Not Vote Times ref1      2
 8 Will Not Vote Times ref2      3
 9 Will Not Vote Times ref3      3
10 Don’t Know    Times ref1     56
# … with 14 more rows
# created grouped relative values 

vote_long_relative <- vote_long %>% 
  group_by(paper, ref) %>% 
  mutate(rel_votes = votes/sum(votes) * 100)

vote_wide_relative <- vote_long_relative %>% 
  select(-votes) %>% 
  pivot_wider(id_cols = c(resp, paper), names_from = "ref", values_from = "rel_votes")

vote_wide_relative
# Groups:   paper [2]
  resp          paper       ref1   ref2   ref3
  <chr>         <chr>      <dbl>  <dbl>  <dbl>
1 Remain        Times     59.5   62.1   59.1  
2 Leave         Times     29.6   26.3   30    
3 Will Not Vote Times      0.377  0.642  0.566
4 Don’t Know    Times     10.5   10.9   10.4  
5 Remain        Telegraph 31.5   33.5   32.8  
6 Leave         Telegraph 56.2   54.0   55.1  
7 Will Not Vote Telegraph  0.663  0.575  0.660
8 Don’t Know    Telegraph 11.6   11.9   11.4  

答案 1 :(得分:2)

也许您正在寻找它

library(tidyverse)
combined_tallies %>% 
  group_by(Paper) %>% 
  mutate(across(where(is.numeric), ~ .x / sum(.x, na.rm = T) * 100))

# A tibble: 20 x 10
# Groups:   Paper [5]
resp  euRefVoteW1 euRefVoteW2 euRefVoteW3 euRefVoteW4 euRefVoteW6 euRefVoteW7 euRefVoteW8
   <fct>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
 1 Rema~      59.5        62.1        59.1        61.0        63.7        60.3        61.2  
 2 Leave      29.6        26.3        30          29.0        25.2        35.6        35.2  
 3 Will~       0.377       0.642       0.566       0.565       0.377       0.377       0.377
 4 Don'~      10.5        10.9        10.4         9.42       10.7         3.77        3.20 
...