我无法在R中找到解决方案(使用data.table)按自定义范围(例如,-18,18-25,...,65 +)对数据进行分组,而不是单个值。< / p>
我现在正在使用的是什么:
DT[,list(M_Savings=mean(Savings), M_Term=mean(Term)), by=Age] [order (Age)]
这给了我以下结果:
Age M_Savings M_Term
1: 18 6500 5.5
2: 19 7000 6.2
3: 20 7200 5.8
...
50: 68 4000 4.2
理想的结果:
Age M_Savings M_Term
1: 18-25 7450 5.5
2: 25-30 8320 6.2
...
50: 65+ 3862 4.3
我希望我的解释足够清楚。 将会感激任何帮助。
答案 0 :(得分:11)
@jdharrison是对的:cut(...)
是要走的路。
library(data.table)
# create sample - you have this already
set.seed(1) # for reproducibility
DT <- data.table(age=sample(15:70,1000,replace=TRUE),
value=rpois(1000,10))
# you start here...
breaks <- c(0,18,25,35,45,65,Inf)
DT[,list(mean=mean(value)),by=list(age=cut(age,breaks=breaks))][order(age)]
# age mean
# 1: (0,18] 10.000000
# 2: (18,25] 9.579365
# 3: (25,35] 10.158192
# 4: (35,45] 9.775510
# 5: (45,65] 9.969697
# 6: (65,Inf] 10.141414
答案 1 :(得分:0)
使用数字变量的示例:
test BucketName
1 615.59148 01. 0 - 5,000
2 1135.42357 01. 0 - 5,000
3 5302.24208 02. 5,000 - 10,000
4 3794.23109 01. 0 - 5,000
5 2773.70667 01. 0 - 5,000
...
并为此代码
generateLabelsForPivot = function(breaksVector)
{
startValue = min(breaksVector)
lastValue = max(breaksVector)
lengthOfBreaks = length(breaksVector)
orders = seq(1, lengthOfBreaks-1, 1)
startingPoints = c(breaksVector[-length(breaksVector)])
finishPoints = c(breaksVector[-1])
addingZeros = function(X)
{
prefix = ""
if(nchar(X) == 1)
{
prefix = "0"
} else {
prefix = ""
}
return(paste(prefix, X, ". ", sep = ""))
}
orderPrefixes = sapply(orders, addingZeros)
startingPoints.pretty = prettyNum(startingPoints, scientific=FALSE, big.mark=",", preserve.width = "none")
finishPoints.pretty = prettyNum(finishPoints, scientific=FALSE, big.mark=",", preserve.width = "none")
labels = paste(orderPrefixes, startingPoints.pretty, " - ", finishPoints.pretty, sep = "")
return(labels)
}
dataFrame = data.frame(test = runif(100, 0, 100*100))
GCV_breaks = c(0, 5000, 10000, 20000, 30000, 1000000)
GCV_labels = generateLabelsForPivot(GCV_breaks)
GCV_labels
GCV_buckets = cut(dataFrame$test, breaks = GCV_breaks, labels = GCV_labels)
dataFrame$BucketName = GCV_buckets