R如何重塑数据并同时聚合某些列

时间:2017-02-20 03:22:13

标签: r aggregate reshape

我正在寻找有关如何获取长数据集并将其扩展的帮助,但也有一些总结。

我的数据集通常看起来像这样(我是SO的新手,很抱歉,如果我的格式不符合标准!):

structure(list(StateSenatorialDistrict = c(41L, 14L, 30L, 38L, 
43L, 37L, 20L, 45L, 37L, 44L), CandidateOfficeCode = structure(c(2L, 
5L, 2L, 5L, 4L, 3L, 1L, 4L, 1L, 1L), .Label = c("ATT", "AUD", 
"STH", "TRE", "USP"), class = "factor"), CandidateLastName = structure(c(4L, 
2L, 1L, 3L, 9L, 5L, 7L, 8L, 7L, 6L), .Label = c("BROWN", "CASTLE", 
"CLINTON", "DEPASQUALE", "MILLER", "RAFFERTY", "SHAPIRO", "TORSELLA ", 
"VOIT"), class = "factor"), CandidateParty = structure(c(2L, 
1L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 3L), .Label = c("CON", "DEM", 
"REP"), class = "factor"), VoteTotal = c(256L, 3L, 202L, 188L, 
18L, 402L, 251L, 383L, 156L, 761L)), .Names = c("StateSenatorialDistrict", 
"CandidateOfficeCode", "CandidateLastName", "CandidateParty", 
"VoteTotal"), row.names = c(30901L, 115192L, 41264L, 1389L, 21982L, 
29827L, 192288L, 20019L, 12803L, 60823L), class = "data.frame")

它是宾夕法尼亚州区级投票数据的数据集。

StateSenatorialDistrict CandidateOfficeCode CandidateLastName CandidateParty VoteTotal
41                 AUD        DEPASQUALE            DEM       256
14                 USP            CASTLE            CON         3
30                 AUD             BROWN            REP       202
38                 USP           CLINTON            DEM       188
43                 TRE              VOIT            REP        18
37                 STH            MILLER            DEM       402
20                 ATT           SHAPIRO            DEM       251
45                 TRE         TORSELLA             DEM       383
37                 ATT           SHAPIRO            DEM       156    
44                 ATT          RAFFERTY            REP       761

还有更多列,但出于此目的,这些都很好。

我希望获取这些数据并对其进行总结,以便每个参议院区获得一行,并在每行上选择其他数据。想法结果看起来像这样(这里的数据是弥补的 - 它不是基于以上所述):

StateSenatorialDistrict SenateRepLastName SenateDemLastName  SenateRepVoteTotal SenateDemVoteTotal ClintonVotes TrumpVotes
41                 BOZO            SMITH            250            300            1000            2000
42                 JOHNSON         CARTER           2012           237            1350            1000
53                 ARCHIBALD       BISHOP           350            500            5000            3000

在任何给定的行上,您知道候选人是参议员b / c他们的CandidateOfficeCode是STS;你知道他们是他们党的民主党或众议员,即REP或DEM。

我知道我可以聚合数据,然后尝试将其转换为宽格式,但这留下了一个非常宽的表,每个候选名称都是一列(并且没有关于办公室或派对的信息):

senateDistricts2016 <- aggregate(VoteTotal ~ StateSenatorialDistrict + CandidateOfficeCode + CandidateFirstName + CandidateLastName + CandidateParty, data=votes2016[votes2016$CandidateOfficeCode %in% c("USP", "STS"),], FUN="sum")
wideSenate <- dcast(senateDistricts2016, StateSenatorialDistrict ~ CandidateLastName)

有一种简单的方法吗?如果没有,想到一个难以做到这一点的方法吗?

提前致谢。如果我的问题没有意义,请告诉我 - 我很乐意编辑。

编辑:

我不相信这是一个骗局。我不只是试图让我的数据更广泛,而是在它变宽时总结它,所以它并不是不合适的。我最终做了一系列聚合然后合并:

senateDs <- aggregate(VoteTotal ~ StateSenatorialDistrict + CandidateName, data=votes2016[votes2016$CandidateOfficeCode=="STS" & votes2016$CandidateParty=="DEM",], FUN="sum")
senateRs <- aggregate(VoteTotal ~ StateSenatorialDistrict + CandidateName, data=votes2016[votes2016$CandidateOfficeCode=="STS" & votes2016$CandidateParty=="REP",], FUN="sum")
senateTotalVotes <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="STS",], FUN="sum")
senateVotes <- merge(senateDs,senateRs, by="StateSenatorialDistrict", all=TRUE)
senateVotes <- merge(senateVotes, senateTotalVotes, by="StateSenatorialDistrict", all=TRUE)

# now aggregate the presidential votes for Rs, Ds, and Total votes and combine
senatePresD <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="USP" & votes2016$CandidateParty=="DEM",], FUN="sum")
senatePresR <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="USP" & votes2016$CandidateParty=="REP",], FUN="sum")
senatePresTotalVotes <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="USP",], FUN="sum")
senateVotes <- merge(senateVotes,senatePresD, by="StateSenatorialDistrict", all=TRUE)
senateVotes <- merge(senateVotes,senatePresR, by="StateSenatorialDistrict", all=TRUE)
senateVotes <- merge(senateVotes, senatePresTotalVotes, by="StateSenatorialDistrict", all=TRUE)


setnames(senateVotes, c("StateSenatorialDistrict", "DCandidate","DVotes","RCandidate","RVotes", "TotalSenatorVotes", "PresDVotes", "PresRVotes","TotalPresVotes"))

0 个答案:

没有答案