如何在一列中将值拆分为相等的范围,并将R中另一列的关联值相加?

时间:2016-07-25 16:30:09

标签: r

我有一个名为Cust_Amount的数据框,如下所示:

Age    Amount_Spent
25       20
43       15
32       27
37       10
45       17
29       10

我想将其划分为相同规模的年龄组,并将每个年龄组的花费总和如下所示:

Age_Group  Total_Amount
 20-30     30
 30-40     37
 40-50     32

2 个答案:

答案 0 :(得分:5)

我们可以使用cut对'年龄'进行分组。并获得' Amount_Spent'的sum基于分组变量。

library(data.table)
setDT(df1)[,.(Total_Amount = sum(Amount_Spent)) , 
       by = .(Age_Group = cut(Age, breaks = c(20, 30, 40, 50)))]

dplyr

library(dplyr)
df1 %>%
    group_by(Age_Group = cut(Age, breaks = c(20, 30, 40, 50))) %>%
    summarise(Total_Amount = sum(Amount_Spent))
#     Age_Group Total_Amount
#      <fctr>        <int>
#1   (20,30]           30
#2   (30,40]           37
#3   (40,50]           32

答案 1 :(得分:3)

以下是使用cutaggregate的基本解决方案,然后使用setNames命名结果列:

mydf$Age_Group <- cut(mydf$Age, breaks = seq(20,50, by = 10))
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum), 
                    c('Age_Group', 'Total_Spent')))

  Age_Group Total_Spent
1   (20,30]          30
2   (30,40]          37
3   (40,50]          32

我们可以使用gsub更进一步匹配您想要的输出(请注意,我不是正则表达式专家):

mydf$Age_Group <- 
    gsub(pattern = ',',
     x = gsub(pattern = ']', 
     x = gsub(pattern = '(', x = mydf$Age_Group, replacement = '', fixed = T),
     replacement = '', fixed = T),
     replacement = ' - ', fixed = T)
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum), 
                  c('Age_Group', 'Total_Spent')))

  Age_Group Total_Spent
1   20 - 30          30
2   30 - 40          37
3   40 - 50          32