PowerPC和SPARC上的data.table错误行为(都是big-endian)

时间:2014-05-22 11:45:10

标签: r data.table

有一个data.table dt,我使用cut进行了一些分类:

require(data.table)
set.seed(1)
dt <- data.table(x = rnorm(10))
dt[, y := cut(x, breaks = c(-Inf, 0, Inf), labels = 1:2)]

如果我将结果系数y转换为数字值(使用基于as.Numeric的函数?factor),则二进制搜索不再有效,尽管{{ 1}}是数字。

z

尝试再次设置密钥,没有帮助:

as.Numeric <- function(f){
  as.numeric(levels(f))[f]
}

dt[, z := as.Numeric(y)] # as.numeric(as.character(y))
                                    # is working ...
dt
##              x y z
##  1: -0.6264538 1 1
##  2:  0.1836433 2 2
##  3: -0.8356286 1 1
##  4:  1.5952808 2 2
##  5:  0.3295078 2 2
##  6: -0.8204684 1 1
##  7:  0.4874291 2 2
##  8:  0.7383247 2 2
##  9:  0.5757814 2 2
## 10: -0.3053884 1 1

setkey(dt, z)
dt
##              x y z
##  1:  0.1836433 2 2
##  2:  1.5952808 2 2
##  3:  0.3295078 2 2
##  4:  0.4874291 2 2
##  5:  0.7383247 2 2
##  6:  0.5757814 2 2
##  7: -0.6264538 1 1
##  8: -0.8356286 1 1
##  9: -0.8204684 1 1
## 10: -0.3053884 1 1

dt[J(1)] # doesn't work
##     x  y z
## 1: NA NA 1

dt[y == 1, ] # works fine
##             x y z
## 1: -0.6264538 1 1
## 2: -0.8356286 1 1
## 3: -0.8204684 1 1
## 4: -0.3053884 1 1

str(dt)
## Classes ‘data.table’ and 'data.frame':   10 obs. of  3 variables:
##  $ x: num  0.184 1.595 0.33 0.487 0.738 ...
##  $ y: Factor w/ 2 levels "1","2": 2 2 2 2 2 2 1 1 1 1
##  $ z: num  2 2 2 2 2 2 1 1 1 1
##  - attr(*, ".internal.selfref")=<externalptr> 
##  - attr(*, "sorted")= chr "z"

矢量扫描正在运行,因为不需要密钥。使用setkey(dt, z) ## Warning message: ## In setkeyv(x, cols, verbose = verbose) : ## Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed. dt ## x y z ## 1: 0.1836433 2 2 ## 2: 1.5952808 2 2 ## 3: 0.3295078 2 2 ## 4: 0.4874291 2 2 ## 5: 0.7383247 2 2 ## 6: 0.5757814 2 2 ## 7: -0.6264538 1 1 ## 8: -0.8356286 1 1 ## 9: -0.8204684 1 1 ## 10: -0.3053884 1 1 dt[J(1)] # doesn't work ## x y z ## 1: NA NA 1 也有效。也许as.numeric(as.character(y))中的[ - 运算符有问题吗?使用与dt 1.8.10相同的代码,一切都像预期的那样。要找出原因并不容易,为什么代码不再适用于1.9.3 ......

问题:

这是一个错误吗?

P.S:

as.Numeric

1 个答案:

答案 0 :(得分:2)

现已修复v1.9.5 on GitHub。感谢您的报道。

  

恢复与大端机器(例如,SPARC和PowerPC)的兼容性。大多数Windows,Linux和Mac系统都是小端的;输入.Platform$endian进行确认。感谢Gerhard Nachtmann的报告以及他们的PowerPC模拟器的QEMU project