错误使用set后data.table中的行数为负数

时间:2013-06-01 20:36:46

标签: r data.table

我遇到了一些有点奇怪的事情,特别是因为代码可能会在每次运行时提供不同的输出。简而言之,我错误地使用set将值设置为大于最后一行的值,但是没有做任何事情set创建了负长度data.table

library(data.table)

dt<-data.table(id=1:5, var=rnorm(5)) # normal example

set(dt, 6L, 1L, 3L) # doesn't set anything as expected.
dt
#
# now my real data, after I found the error in my code (incorrect row number in set)
#
dt1 <- data.table(ID = "29502509", FY = 2012, VAR = 61067.5442975645, 
                      startDate = structure(15062L, class = c("IDate", "Date")), 
                      endDate = structure(15429L, class = c("IDate", "Date")), 
                      start = "1750", end = "2404",
                      date = structure(15461L,class = c("IDate", "Date")),
                      DESCR = "JOB", NOTE = "NEW")

set(dt1, 12L, 3L, 62385.6516144086)
str(dt1)
Classes ‘data.table’ and 'data.frame':  1 obs. of  10 variables:
 $ ID       : chr "29502509"
 $ FY       : num 2012
 $ VAR      : num 61068
 $ startDate: IDate, format: "2011-03-29"
 $ endDate  :
Error in do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) : 
  negative length vectors are not allowed
> sapply(dt1, length)
        ID         FY        VAR  startDate    endDate      start        end       date 
         1          1          1          1 -637110831          1          1          1 
     DESCR       NOTE 
         1          1 
> dput(dt1)
structure(list(ID = "29502509", FY = 2012, VAR = 61067.5442975645, 
    startDate = structure(15062L, class = c("IDate", "Date")), 
    endDate = structure(, class = c("IDate", "Date")), start = "1750", # HERE
    end = "2404", date = structure(15461L, class = c("IDate", 
    "Date")), DESCR = "JOB", NOTE = "NEW"), .Names = c("ID", 
"FY", "VAR", "startDate", "endDate", "start", "end", "date", 
"DESCR", "NOTE"), row.names = c(NA, -1L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000000130788>)

如上所述,您可能需要运行整个代码一段时间才能看到,从data.table dt1 <- data.table(...set(dt1,...的创建,因为我注意到如果不是发生在第一次不会发生,除非我重新运行dt1 <- data.table(...。有什么想法吗?

编辑:

具体来说,当我说不同的结果时,我的意思是有时它什么也不做(正如预期的那样),但大多数时候它会创建一个负长度列总是 the Date,有时候它会创建一个带有负行的整个data.table加号,在最后两种情况下(单列或整个data.table),负长度始终为-637110831

1 个答案:

答案 0 :(得分:3)

由于写入超出为列分配的内存而导致内存损坏。

这会调用assign中的assign.c。从版本1.8.8开始,assign.c:434:

434             default :
435                 for (r=0; r<targetlen; r++)
436                     memcpy((char *)DATAPTR(targetcol) + (INTEGER(rows)[r]-1)*size, 
437                            (char *)DATAPTR(RHS) + (r%vlen) * size,
438                            size);

达到此代码(不应该是这种情况)。此时:

(gdb) p INTEGER(rows)[0]
$21 = 12
(gdb) p size
$23 = 8