试图读入“|”使用data.table包中的fread()分隔文件并收到错误。错误是“'在第104970行结束字段14”。
有没有看到更老的问题,询问fread()是否有报价处理功能,还没有找到最新的东西。还注意到“sep2”功能即将推出,这是解决这个长期问题的功能吗?
我可以使用read.table()读取相同的数据:
df.readtable<-read.table(myfile,header=F,sep="|",quote="\"",fill=T,stringsAsFactors=F)
但是我无法使用fread()重现结果:
> require(data.table)
> df.fread<-fread(myfile,verbose=T)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.215B
File is opened and mapped ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep='|'
Found 17 columns
First row with 17 fields occurs on line 1 (either column names or first row of data)
Some fields on line 1 are not type character (or are empty). Treating as a data row and using default column names.
Count of eol after first data row: 2000001
Subtracted 1 for last eol and any trailing empty lines, leaving 2000000 data rows
Type codes: 44414114444444424 (first 5 rows)
Type codes: 44444114444444424 (+middle 5 rows)
Type codes: 44444114444444424 (+last 5 rows)
Type codes: 44444114444444424 (after applying colClasses and integer64)
Type codes: 44444114444444424 (after applying drop or select (if supplied)
Allocating 17 column slots (17 - 0 NULL)
Error in fread(myfile, verbose = T) :
' ends field 14 on line 104970 when reading data: LW61026|CITY|STATE|000111|L|00|1800|||N|N|N|CHANEL|"CHARLIE" BOARD|2011 CITY|19911114000000|
会话和包信息:
> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.2
loaded via a namespace (and not attached):
[1] plyr_1.8.1 Rcpp_0.11.1 reshape2_1.2.2 stringr_0.6.2 tools_3.0.3