我的data.frame包含以下字段:
State County Race
FL Broward Black
FL Broward White
GA DeKalb White
GA Fulton Hispanic
等等。我需要的是在唯一race
组合中对每个State - County
(因为它是自己的变量)的计数,我想保持0并且还得到总数。所以对于上面的例子,我想得到:
State County White Black Hispanic Total
FL Broward 1 1 0 2
GA DeKalb 1 0 0 1
GA Fulton 0 0 1 1
我可以使用state-county
包裹{plyr}
获得总计:
count(deaths,c("State","County"))
但是当我添加额外的竞赛层时,我将每个竞赛都放在自己的行上,而不是列。输出如下:
State County Race Freq
TX Bee Unknown 1
TX Bee White 1
TX Bell Black 1
TX Bell Unknown 3
TX Bell White 3
如何以我需要的格式获取此内容?
答案 0 :(得分:4)
使用" data.table"你可以尝试:
library(data.table)
dcast(as.data.table(mydf)[, count := .N, by = names(mydf)],
State + County ~ Race, fun = c, value.var = "count", fill = 0)[
, Total := rowSums(.SD), by = .(State, County)][]
# State County Black Hispanic White Total
# 1: FL Broward 1 0 1 2
# 2: GA DeKalb 0 0 1 1
# 3: GA Fulton 0 1 0 1
我似乎无法通过不先创建"计数"来保存任何详细程度。柱。以下是我试图直接在dcast
中处理它的内容:
dcast(as.data.table(mydf), State + County ~ Race,
fun.aggregate = function(x) as.numeric(!is.na(x)), fill = 0)[
, Total := rowSums(.SD), by = .(State, County)][]
答案 1 :(得分:2)
我们可以使用dplyr中的count
,然后使用spread
数据来扩大数据:
library(dplyr)
library(tidyr)
dat %>% count(State, County, Race) %>%
spread(Race, n, fill = 0) %>%
mutate(total = rowSums(.[sapply(., is.numeric)]))
Source: local data frame [3 x 6]
State County Black Hispanic White total
(fctr) (fctr) (dbl) (dbl) (dbl) (dbl)
1 FL Broward 1 0 1 2
2 GA DeKalb 0 0 1 1
3 GA Fulton 0 1 0 1
答案 2 :(得分:2)
dt = read.table(text="State County Race
FL Broward Black
FL Broward White
GA DeKalb White
GA Fulton Hispanic", header=T)
library(dplyr)
library(tidyr)
dt %>%
group_by(State,County) %>%
mutate(Total = n()) %>%
count(State,County,Race,Total) %>%
ungroup() %>%
spread(Race,n, fill=0) %>%
select(-matches("Total"), Total)
# State County Black Hispanic White Total
# (fctr) (fctr) (dbl) (dbl) (dbl) (int)
# 1 FL Broward 1 0 1 2
# 2 GA DeKalb 0 0 1 1
# 3 GA Fulton 0 1 0 1