我一直在使用data.table(v1.10)并注意到使用fwrite时的一个错误。一些背景。
sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)
拥有多核机器。
生成一些数据
#Generate some data
rows = 2500000
set.seed(Sys.time())
DF <- data.frame(index = 1:rows,
catsA = sample((letters[1:10]),rows,replace=T),
catsB = sample((letters[1:10]),rows,replace=T),
catsC = sample((letters[1:10]),rows,replace=T),
catsD = sample((letters[1:10]),rows,replace=T),
catsE = sample((letters[1:10]),rows,replace=T),
valueA = round(rnorm(rows),3),
valueB = rpois(rows, lambda = 4))
#Convert to data.table
DT <- data.table(DF)
#Create a new column
DT <- DT[,valueNew := valueA*valueB]
#Write
write.csv(DT,file="DT_write_csv.csv",row.names=F)
fwrite(DT, file = "DT_fwrite.csv",row.names=F)
继续阅读并加入
#Read back in and join
DT_csv <- fread("DT_write_csv.csv")
DT_fwrite <- fread("DT_fwrite.csv")
setkey(DT_csv,"index")
setkey(DT_fwrite,"index")
join_DT <- DT_csv[DT_fwrite]
比较
nrow(join_DT[valueNew != i.valueNew])
[1] 1
join_DT[valueNew != i.valueNew,.(index,valueNew,i.valueNew)]
index valueNew i.valueNew
1: 67097 2.855 5.71
DT[index==67097,.(valueNew)]
valueNew
1: 2.855
从比较,原始DT具有fwrite腐败的a。有时,它不止一行,而且在一个真实的例子中传播了很多列。
我是否对fwrite做错了什么?