如何根据R中的两列获得列的总和

时间:2016-07-05 14:28:06

标签: r dataframe dplyr

我有一个包含5列的数据框(df):Area.NameAgeTotalRuralUrban。我需要根据Total获得Area.Name的总和,然后根据Age分为两类:0-2和3-4。

df <- 
structure(list(Area.Name = structure(c(6L, 6L, 6L, 6L, 6L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("District - Central (06)", "District - East (04)", 
"District - New Delhi (05)", "District - North (02)", "District - North East (03)", 
"District - North West (01)", "District - South (09)", "District - South West (08)", 
"District - West (07)", "NCT OF DELHI (07)"), class = "factor"), 
    Age = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L, 5L), Total = c(56131L, 
    58644L, 63835L, 63859L, 64945L, 24556L, 27076L, 27234L, 27604L, 
    27725L, 30780L), Rural = c(3589L, 3757L, 4200L, 4102L, 4223L, 
    52L, 56L, 61L, 47L, 67L, 53L), Urban = c(52542L, 54887L, 
    59635L, 59757L, 60722L, 24504L, 27020L, 27173L, 27557L, 27658L, 
    30727L)), .Names = c("Area.Name", "Age", "Total", "Rural", 
"Urban"), row.names = c(102L, 103L, 104L, 105L, 106L, 405L, 406L, 
407L, 408L, 409L, 410L), class = "data.frame")

我的预期输出是:

Area.Name                    Age Total   
District - North West (01)   0-2 178610  
District - North West (01)   3-4 128804  
District - East (04)         0-2 78866
District - East (04)         3-4 55329

我尝试使用dplyr套餐,但我对此并不熟悉,所以有点困在这里:

df %>% group_by(Area.Name) %>% summarize(Age = Age[0],Tot = sum(Total))

问题在于Age这里我无法给出范围。

2 个答案:

答案 0 :(得分:1)

以下是基础R中使用cutaggregate的方法:

df$ageCat <- cut(df$Age, breaks=c(0, 2, max(df$Age)), include.lowest = T)
aggregate(Total~Area.Name+ageCat, data=df, sum)
                   Area.Name ageCat  Total
1       District - East (04)  [0,2]  78866
2 District - North West (01)  [0,2] 178610
3       District - East (04)  (2,5]  86109
4 District - North West (01)  (2,5] 128804

cut将Age变量分解为所需的类别。然后将data.frame聚合在所需的变量上。

答案 1 :(得分:1)

这是@{ if (ViewContext.Controller.UserHasPermission("DeleteWithPaymentRollback")) { <button type="button" (click)="showconfirm()" class="btn btn-danger btn-xs"> <span class="fa fa-recycle"></span> </button> }} cut()Age函数内联的一种方式:

group_by

要仅获取所需的群组,您可以添加library(dplyr) df %>% group_by(Area.Name, Age = cut(Age, breaks = c(0, 2, 4, +Inf), labels = c("0-2", "3-4", "4+"), include.lowest = TRUE)) %>% summarise(Total = sum(Total)) # Area.Name Age Total # <fctr> <fctr> <int> # 1 District - East (04) 0-2 78866 # 2 District - East (04) 3-4 55329 # 3 District - East (04) 4+ 30780 # 4 District - North West (01) 0-2 178610 # 5 District - North West (01) 3-4 128804