当我使用data.table创建一个字符串列时,使用data.frame参数stringsAsFactor = F,结果data.table正确使用stringsAsFactor = F参数,但随后添加了一个额外的列“stringsAsFactor”。很容易摆脱额外的列。但有没有办法告诉data.frame不要根据data.frame参数添加列?即,这是一个错误还是一个功能?请参阅下面的ToyExample:
library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
summary(factorTest)
Length Class Mode
50 character character
summary(as.factor(factorTest))
A AB B O
10 18 7 15
test1 <- data.frame(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
summary(test1)
dabo dabostr
O :15 Length:50
A :10 Class :character
B : 7 Mode :character
AB:18
summary(test2)
dabo dabostr stringsAsFactors
O :15 Length:50 Mode :logical
A :10 Class :character FALSE:50
B : 7 Mode :character NA's :0
AB:18
答案 0 :(得分:1)
这已在commit 3dbc493中修复,现在data.table()
具有完全正常的stringAsFactors
参数。
如果为TRUE,它将使用快速内部 as.factor 功能,因为基础factor()
很慢。
您的代码下方可以重现最新的数据。表1.9.7。
library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
test1 <- data.frame(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
summary(test1)
# dabo dabostr
# O : 8 Length:50
# A :10 Class :character
# B :16 Mode :character
# AB:16
summary(test2)
# dabo dabostr
# O : 8 Length:50
# A :10 Class :character
# B :16 Mode :character
# AB:16