我正在使用csv
中的fread
将data.table
文件读入R中。读取过程因此错误而停止:
fstDF <- fread("dat.csv")
Read 34.5% of 2000004 rowsError in fread("dat.csv") :
Expected sep (',') but new line or EOF ends field 25 on line 800747 when reading data: John,,,ID,362526197318501X,M,19730218, ,,F,,CHN,44,4403,,,,,,,13828890538,,, ,M
我检查了数据,发现错误是由字段中的新行字符引起的,该行将一行记录分成两行。就像下面的示例数据中的Steve
行一样:
Name,CardNo,Descriot,CtfTp,CtfId,Gender,Birthday,Address,Zip,Dirty,District1,District2,District3,District4,District5,District
6,FirstNm,LastNm,Duty,Mobile,Tel,Fax,EMail,Nation,Taste,Education,Company,CTel,CAddress,CZip,Family,Version,id
Mike,,,OTH,010-116321,M,19000101,,100080, ,,CHN,0,0,,,,,,10116,010-82808028,010-82828028-208,chenmeng@dist.
Steve,,,GID,0282,M,19000101,,051430, ,,CHN
,0,0,,,,,,13831193762,0311-88030066,0311-88030088,info@shineway.com,,,,,
Nicholas,,,OTH,010-125321,F,19000101,,100097,,,CHN,0,0,,,,,,10125,010-88400202,010-88400260,,,,,,,,,,,4
Abrham,,,OTH,010-130321,F,19000101,,100029,,,CHN,0,0,,,,,,10130,010-51292052/3-802,010-51292052/3-811,
Bill,,,OTH,010-142321,F,19000101,,100007,,,CHN,0,0,,,,,,10142,010-67687044,010-67687044,baiguoshouyue@sina
Zabrina,,,OTH,010-186321,F,19000101,,100101,,,CHN,0,0,,,,,,13942697025,010-64869596/0411-668895950,0411-6688519
Julia,,,OTH,021-044321,M,19000101,,201206,,,CHN,0,0,,,,,,21044,021-28995000*208,021-50315077,jane.dai@parker.com
Dave,,,OTH,021-127321,M,19000101,,200008,,,CHN,0,0,,,,,,21127,021-55150244,021-55150344,,,,,,,,,,,9
Cecilia,,,OTH,021-151321,F,19000101,,201108,,,CHN,0,0,,,,,,21151,021-61451188,021-61452602,reception.china@eurotherm.co
此数据是从Microsoft SQL Server
导出的。我无法访问数据库,我不知道导出过程有什么问题。但我当然知道这是一个错误的新行字符导致阅读问题。
这是关于stackoverflow的类似问题(没有明确的解决方案): Importing csv file to R new line issue
问题:
如何使用换行符读取csv
数据?
答案 0 :(得分:1)
第1步:删除行中间的append()
:
^M
参考:How to remove carriage returns in the middle of a line
第2步:用perl -pe 's/\r(?!\n)//g'
替换\n,
(请参阅下面的@jimmij的回答。)
,
第3步:在fread中照常阅读:
perl -p00e 's/\n,/,/g'