quantile(X, prob = seq(0, 1, length = 5), type = 5)
如何将此传输到data.table操作以使用:=
添加新列,并为每个ID分配一个值,如果值在bin中,则分配适当的有序值,如25%=每个ID,1,50%= 2等?
答案 0 :(得分:4)
您可以使用findInterval
。这将允许您使用quantile
及其各种定义。
例如
findInterval(x, quantile(x,type=5), rightmost.closed=TRUE)
# It is fast
set.seed(1)
DT <- data.table(x=rnorm(1e6))
library(microbenchmark)
microbenchmark(
order = DT[order(x),bin:=ceiling(.I/.N*5)],
findInterval = DT[, b2 :=findInterval(x, quantile(x,type=5), rightmost.closed=TRUE)],times=10 )
## Unit: milliseconds
## expr min lq median uq max neval
## order 551.31154 568.20324 573.36605 640.3255 655.5024 10
## findInterval 70.16782 79.11459 80.36363 140.2807 147.3080 10
答案 1 :(得分:2)
对于没有联系的数据,一个简单的解决方案就是手动拆分......
set.seed(1)
DT <- data.table(x=rnorm(20))
DT[order(x),bin:=ceiling(.I/.N*5)]
导致
x bin
1: -0.62645381 1
2: 0.18364332 3
3: -0.83562861 1
4: 1.59528080 5
5: 0.32950777 3
6: -0.82046838 1
7: 0.48742905 3
8: 0.73832471 4
9: 0.57578135 4
10: -0.30538839 2
11: 1.51178117 5
12: 0.38984324 3
13: -0.62124058 2
14: -2.21469989 1
15: 1.12493092 5
16: -0.04493361 2
17: -0.01619026 2
18: 0.94383621 5
19: 0.82122120 4
20: 0.59390132 4