data.frame参数的data.table创建了额外的列

时间:2015-11-26 21:19:09

标签: r data.table

当我使用data.table创建一个字符串列时,使用data.frame参数stringsAsFactor = F,结果data.table正确使用stringsAsFactor = F参数,但随后添加了一个额外的列“stringsAsFactor”。很容易摆脱额外的列。但有没有办法告诉data.frame不要根据data.frame参数添加列?即,这是一个错误还是一个功能?请参阅下面的ToyExample:

library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
summary(factorTest)
   Length     Class      Mode 
       50 character character 
summary(as.factor(factorTest))
 A AB  B  O 
10 18  7 15 
test1 <- data.frame(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
summary(test1)
 dabo      dabostr         
 O :15   Length:50         
 A :10   Class :character  
 B : 7   Mode  :character  
 AB:18                     
summary(test2)
 dabo      dabostr          stringsAsFactors
 O :15   Length:50          Mode :logical   
 A :10   Class :character   FALSE:50        
 B : 7   Mode  :character   NA's :0         
 AB:18                    

1 个答案:

答案 0 :(得分:1)

这已在commit 3dbc493中修复,现在data.table()具有完全正常的stringAsFactors参数。
如果为TRUE,它将使用快速内部 as.factor 功能,因为基础factor()很慢。
您的代码下方可以重现最新的数据。表1.9.7。

library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
test1 <- data.frame(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
summary(test1)
# dabo      dabostr         
# O : 8   Length:50         
# A :10   Class :character  
# B :16   Mode  :character  
# AB:16                                   
summary(test2)
# dabo      dabostr         
# O : 8   Length:50         
# A :10   Class :character  
# B :16   Mode  :character  
# AB:16