我想为每个小组做分位数切割(切成n个分数相等的分词)
qcut = function(x, n) {
quantiles = seq(0, 1, length.out = n+1)
cutpoints = unname(quantile(x, quantiles, na.rm = TRUE))
cut(x, cutpoints, include.lowest = TRUE)
}
library(data.table)
dt = data.table(A = 1:10, B = c(1,1,1,1,1,2,2,2,2,2))
dt[, bin := qcut(A, 3)]
dt[, bin2 := qcut(A, 3), by = B]
dt
A B bin bin2
1: 1 1 [1,4] [6,7.33]
2: 2 1 [1,4] [6,7.33]
3: 3 1 [1,4] (7.33,8.67]
4: 4 1 [1,4] (8.67,10]
5: 5 1 (4,7] (8.67,10]
6: 6 2 (4,7] [6,7.33]
7: 7 2 (4,7] [6,7.33]
8: 8 2 (7,10] (7.33,8.67]
9: 9 2 (7,10] (8.67,10]
10: 10 2 (7,10] (8.67,10]
此处没有分组的剪切是正确的 - 数据位于bin中。但是小组的结果是错误的。
我该如何解决?
答案 0 :(得分:8)
这是处理因素的错误。请检查它是否已知(或在开发版本中修复),否则将其报告给data.table错误跟踪器。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${surefire-version}</version>
<configuration>
<parallel>methods</parallel>
<threadCount>10</threadCount>
<forkCount>2</forkCount>
<reuseForks>true</reuseForks>
<parallelTestsTimeoutInSeconds>300</parallelTestsTimeoutInSeconds>
<groups>${testcase.groups}</groups>
</configuration>
</plugin>