我正在尝试得出属于两个不同类别的统计数据的相对比例。这是原始文件的示例。
A tibble: 8 x 5
resp euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper
<fct> <int> <int> <int> <fct>
1 Remain 316 290 313 Times
2 Leave 157 123 159 Times
3 Will Not Vote 2 3 3 Times
4 Don't Know 56 51 55 Times
5 Remain 190 175 199 Telegraph
6 Leave 339 282 334 Telegraph
7 Will Not Vote 4 3 4 Telegraph
8 Don't Know 70 62 69 Telegraph
这是两个不同因素的总和。我正在尝试将响应的计数转换为百分比,以便看起来像这样:
A tibble: 8 x 5
resp euRefVoteW1 euRefVoteW2 euRefVoteW3 Paper
1 Remain 52% 53% .. Times
2 Leave 43% 42% .. Times
3 Will Not Vote 1% 2% . Times
4 Don't Know 4% 3% . Times
5 Remain 35% 35% . Telegraph
6 Leave 52% 52% . Telegraph
7 Will Not Vote 2% 2% . Telegraph
8 Don't Know 11% 11% . Telegraph
(显然,这些数字是不正确的,但我希望它表明每个4 x 1部分的总和应为100%)。
数据帧已经具有与表类似的格式,因此有没有办法将prop.table方法应用于df?当我这样尝试时,它拒绝,因为df不是干净的数组。有办法解决吗?
for_stack <- combined_tallies %>%
group_by(Paper, resp) %>%
prop.table(margin=2)
Here is an rds copy of the dataframe if this helps!
[我可以在SO的其他地方找到的最佳答案毫无用处](Percentage of factor levels by group in R)
答案 0 :(得分:3)
我已经使用dput()
重新创建了您的数据集,建议您使用它来提供可重现的数据,以获取StackOverflow上的答案。
votes <- structure(list(resp = c("Remain", "Leave", "Will Not Vote", "Don’t Know",
"Remain", "Leave", "Will Not Vote", "Don’t Know"), ref1 = c(316,
157, 2, 56, 190, 339, 4, 70), ref2 = c(290, 123, 3, 51, 175,
282, 3, 62), ref3 = c(313, 159, 3, 55, 199, 334, 4, 69), paper = c("Times",
"Times", "Times", "Times", "Telegraph", "Telegraph", "Telegraph",
"Telegraph")), .Names = c("resp", "ref1", "ref2", "ref3", "paper"
), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
另一种方法是在执行分析之前更改数据集的结构。您试图创建相对值,而不是跨整个列或行,而是针对子集。解决此问题的一种方法是使用tidyverse
软件包并以该格式执行分析。计算百分比后,您始终可以恢复到原始结构。
library(tidyverse)
vote_long <- votes %>%
pivot_longer(cols = c(ref1, ref2, ref3), names_to = "ref", values_to = "votes")
vote_long
# A tibble: 24 x 4
resp paper ref votes
<chr> <chr> <chr> <dbl>
1 Remain Times ref1 316
2 Remain Times ref2 290
3 Remain Times ref3 313
4 Leave Times ref1 157
5 Leave Times ref2 123
6 Leave Times ref3 159
7 Will Not Vote Times ref1 2
8 Will Not Vote Times ref2 3
9 Will Not Vote Times ref3 3
10 Don’t Know Times ref1 56
# … with 14 more rows
# created grouped relative values
vote_long_relative <- vote_long %>%
group_by(paper, ref) %>%
mutate(rel_votes = votes/sum(votes) * 100)
vote_wide_relative <- vote_long_relative %>%
select(-votes) %>%
pivot_wider(id_cols = c(resp, paper), names_from = "ref", values_from = "rel_votes")
vote_wide_relative
# Groups: paper [2]
resp paper ref1 ref2 ref3
<chr> <chr> <dbl> <dbl> <dbl>
1 Remain Times 59.5 62.1 59.1
2 Leave Times 29.6 26.3 30
3 Will Not Vote Times 0.377 0.642 0.566
4 Don’t Know Times 10.5 10.9 10.4
5 Remain Telegraph 31.5 33.5 32.8
6 Leave Telegraph 56.2 54.0 55.1
7 Will Not Vote Telegraph 0.663 0.575 0.660
8 Don’t Know Telegraph 11.6 11.9 11.4
答案 1 :(得分:2)
也许您正在寻找它
library(tidyverse)
combined_tallies %>%
group_by(Paper) %>%
mutate(across(where(is.numeric), ~ .x / sum(.x, na.rm = T) * 100))
# A tibble: 20 x 10
# Groups: Paper [5]
resp euRefVoteW1 euRefVoteW2 euRefVoteW3 euRefVoteW4 euRefVoteW6 euRefVoteW7 euRefVoteW8
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Rema~ 59.5 62.1 59.1 61.0 63.7 60.3 61.2
2 Leave 29.6 26.3 30 29.0 25.2 35.6 35.2
3 Will~ 0.377 0.642 0.566 0.565 0.377 0.377 0.377
4 Don'~ 10.5 10.9 10.4 9.42 10.7 3.77 3.20
...