我正在努力学习R,我决定通过构建一个东西来解读我的州在选举之夜提出的现场选举结果。不幸的是,我在计算用于地图填充的Margin
值方面遇到了麻烦。我的州(WA)使用前2名小学,这意味着在一些比赛中,11月选举中有两个同一党派人士。这可能是太多的背景,但无论如何这里是编码问题:
我有一个如下所示的数据框:
Dist Party Votes
1 (Prefers Democratic Party) 124151
1 (Prefers Republican Party) 101428
2 (Prefers Democratic Party) 122173
2 (Prefers Republican Party) 79518
3 (Prefers Republican Party) 124796
3 (Prefers Democratic Party) 78018
4 (Prefers Republican Party) 75307
4 (Prefers Republican Party) 77772
5 (Prefers Republican Party) 135470
5 (Prefers Democratic Party) 87772
6 (Prefers Democratic Party) 141265
6 (Prefers Republican Party) 83025
7 (Prefers Democratic Party) 203954
7 (Prefers Republican Party) 47921
8 (Prefers Republican Party) 125741
8 (Prefers Democratic Party) 73003
9 (Prefers Democratic Party) 118132
9 (Prefers Republican Party) 48662
10 (Prefers Democratic Party) 99279
10 (Prefers Republican Party) 82213
我想让它看起来像这样:
Dist (Prefers Democratic Party) (Prefers Republican Party)
1 124151 101428
2 122173 79518
3 78018 124796
4 [NA or 0] 153079
5 87772 135470
6 141265 83025
7 203954 47921
8 73003 125741
9 118132 48662
10 99279 82213
由于spread()
中的重复, Dist = 4
无效。我已经设法将这些问题放在这里,但我对此并不满意,而且我几乎是积极的,这是一个更好的方法
library(tidyr)
library(dplyr)
CongressTidy %>%
group_by(Dist) %>%
mutate(GOPVotes = sum(ifelse(Party == "(Prefers Republican Party)", Votes, 0))) %>%
mutate(DemVotes = sum(ifelse(Party == "(Prefers Democratic Party)", Votes, 0)))
返回:
Dist Party Votes GOPVotes DemVotes
<fctr> <fctr> <int> <dbl> <dbl>
1 (Prefers Democratic Party) 124151 101428 124151
1 (Prefers Republican Party) 101428 101428 124151
2 (Prefers Democratic Party) 122173 79518 122173
2 (Prefers Republican Party) 79518 79518 122173
3 (Prefers Republican Party) 124796 124796 78018
3 (Prefers Democratic Party) 78018 124796 78018
4 (Prefers Republican Party) 75307 153079 0
4 (Prefers Republican Party) 77772 153079 0
5 (Prefers Republican Party) 135470 135470 87772
5 (Prefers Democratic Party) 87772 135470 87772
6 (Prefers Democratic Party) 141265 83025 141265
6 (Prefers Republican Party) 83025 83025 141265
7 (Prefers Democratic Party) 203954 47921 203954
7 (Prefers Republican Party) 47921 47921 203954
8 (Prefers Republican Party) 125741 125741 73003
8 (Prefers Democratic Party) 73003 125741 73003
9 (Prefers Democratic Party) 118132 48662 118132
9 (Prefers Republican Party) 48662 48662 118132
10 (Prefers Democratic Party) 99279 82213 99279
10 (Prefers Republican Party) 82213 82213 99279
这很好,就目前而言,我可以添加选择器列并选择:
CongressMargins <- CongressTidy %>%
group_by(Dist) %>%
mutate(GOPVotes = sum(ifelse(Party == "(Prefers Republican Party)", Votes, 0))) %>%
mutate(DemVotes = sum(ifelse(Party == "(Prefers Democratic Party)", Votes, 0))) %>%
mutate(selector = c(1,2)) %>%
subset(selector == 1, select = c(Dist, GOPVotes, DemVotes))
这给了我想要的东西,我可以从那里计算保证金:
Dist GOPVotes DemVotes
<fctr> <dbl> <dbl>
1 101428 124151
2 79518 122173
3 124796 78018
4 153079 0
5 135470 87772
6 83025 141265
7 47921 203954
8 125741 73003
9 48662 118132
10 82213 99279
但是如果有2个无人反对的比赛会被搞砸,因为它是基于矢量回收。它只是丑陋。并且必须有更好的方法。任何想法?
答案 0 :(得分:3)
我们可以先计算群数,然后再推广。如果您希望缺少的单元格为0,请使用spread(Party, Votes, fill = 0)
。
library(tidyverse)
dat2 <- dat %>%
group_by(Dist, Party) %>%
summarise(Votes = sum(Votes)) %>%
spread(Party, Votes) %>%
ungroup()
dat2
# # A tibble: 10 x 3
# Dist `(Prefers Democratic Party)` `(Prefers Republican Party)`
# <int> <int> <int>
# 1 1 124151 101428
# 2 2 122173 79518
# 3 3 78018 124796
# 4 4 NA 153079
# 5 5 87772 135470
# 6 6 141265 83025
# 7 7 203954 47921
# 8 8 73003 125741
# 9 9 118132 48662
# 10 10 99279 82213
数据强>
dat <- read.table(text = "Dist Party Votes
1 '(Prefers Democratic Party)' 124151
1 '(Prefers Republican Party)' 101428
2 '(Prefers Democratic Party)' 122173
2 '(Prefers Republican Party)' 79518
3 '(Prefers Republican Party)' 124796
3 '(Prefers Democratic Party)' 78018
4 '(Prefers Republican Party)' 75307
4 '(Prefers Republican Party)' 77772
5 '(Prefers Republican Party)' 135470
5 '(Prefers Democratic Party)' 87772
6 '(Prefers Democratic Party)' 141265
6 '(Prefers Republican Party)' 83025
7 '(Prefers Democratic Party)' 203954
7 '(Prefers Republican Party)' 47921
8 '(Prefers Republican Party)' 125741
8 '(Prefers Democratic Party)' 73003
9 '(Prefers Democratic Party)' 118132
9 '(Prefers Republican Party)' 48662
10 '(Prefers Democratic Party)' 99279
10 '(Prefers Republican Party)' 82213",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
您可以使用dcast
包中的reshape2
指定聚合函数为sum
library(reshape2)
dcast(dat,Dist~Party,sum,value.var = "Votes")
Dist (Prefers Democratic Party) (Prefers Republican Party)
1 1 124151 101428
2 2 122173 79518
3 3 78018 124796
4 4 0 153079
5 5 87772 135470
6 6 141265 83025
7 7 203954 47921
8 8 73003 125741
9 9 118132 48662
10 10 99279 82213
使用基数R:
xtabs(Votes~Dist+Party,dat)
Party
Dist (Prefers Democratic Party) (Prefers Republican Party)
1 124151 101428
2 122173 79518
3 78018 124796
4 0 153079
5 87772 135470
6 141265 83025
7 203954 47921
8 73003 125741
9 118132 48662
10 99279 82213
以上输出属于table
类,您可以通过以下方式将其设为数据框:
as.data.frame.matrix(xtabs(Votes~Dist+Party,dat))
现在这是一个数据框,您可以按照自己想要的方式进行分组