如何使用新组的总和创建新观察?

时间:2016-06-17 20:03:03

标签: r dplyr tidyr

我有以下数据框:

select * 
from (
      SELECT DISTINCT YEAR(CreatedDate) as FY 
      from MyTable
      union
      SELECT max(YEAR(CreatedDate))+1 as FY 
      from MyTable
     )x
ORDER BY FY ASC

我需要在以下数据框中重新分组年龄类别:

gender age   population
H      0-4   5
H      5-9   5
H      10-14 10
H      15-19 15
H      20-24 15
H      25-29 10
M      0-4   0
M      5-9   5
M      10-14 5
M      15-19 15
M      20-24 10
M      25-29 15

我更喜欢dplyr,所以如果有办法用这个包完成这个,我很感激。

2 个答案:

答案 0 :(得分:7)

使用字符串拆分 - writetidyr::separate()

cut()

答案 1 :(得分:0)

data.table解决方案,其中dat是表格:

library(data.table)
dat <- as.data.table(dat)
dat[ , mn := as.numeric(sapply(strsplit(age, "-"), "[[", 1))]
dat[ , age := cut(mn, breaks = c(0, 14, 19, 29), 
              include.lowest = TRUE, 
              labels = c("0-14", "15-19", "20-29"))]
dat[ , list(population = sum(population)), by = list(gender, age)]
#    gender   age population
# 1:      H  0-14         20
# 2:      H 15-19         15
# 3:      H 20-29         25
# 4:      M  0-14         10
# 5:      M 15-19         15
# 6:      M 20-29         25