我有一个名为Cust_Amount
的数据框,如下所示:
Age Amount_Spent
25 20
43 15
32 27
37 10
45 17
29 10
我想将其划分为相同规模的年龄组,并将每个年龄组的花费总和如下所示:
Age_Group Total_Amount
20-30 30
30-40 37
40-50 32
答案 0 :(得分:5)
我们可以使用cut
对'年龄'进行分组。并获得' Amount_Spent'的sum
基于分组变量。
library(data.table)
setDT(df1)[,.(Total_Amount = sum(Amount_Spent)) ,
by = .(Age_Group = cut(Age, breaks = c(20, 30, 40, 50)))]
或dplyr
library(dplyr)
df1 %>%
group_by(Age_Group = cut(Age, breaks = c(20, 30, 40, 50))) %>%
summarise(Total_Amount = sum(Amount_Spent))
# Age_Group Total_Amount
# <fctr> <int>
#1 (20,30] 30
#2 (30,40] 37
#3 (40,50] 32
答案 1 :(得分:3)
以下是使用cut
和aggregate
的基本解决方案,然后使用setNames
命名结果列:
mydf$Age_Group <- cut(mydf$Age, breaks = seq(20,50, by = 10))
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum),
c('Age_Group', 'Total_Spent')))
Age_Group Total_Spent
1 (20,30] 30
2 (30,40] 37
3 (40,50] 32
我们可以使用gsub
更进一步匹配您想要的输出(请注意,我不是正则表达式专家):
mydf$Age_Group <-
gsub(pattern = ',',
x = gsub(pattern = ']',
x = gsub(pattern = '(', x = mydf$Age_Group, replacement = '', fixed = T),
replacement = '', fixed = T),
replacement = ' - ', fixed = T)
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum),
c('Age_Group', 'Total_Spent')))
Age_Group Total_Spent
1 20 - 30 30
2 30 - 40 37
3 40 - 50 32