读取csv文件时如何处理意外数据?

时间:2019-09-19 01:36:27

标签: r

我需要使用代码读取CSV文件

originalDataset <- fread("file.csv", 
                         encoding = "UTF-8", sep = ",", 
                         select = c("OperationDate","TenantID","Type","EMail","ClientType","Param4"))

Sep是“,”,但是有时会有意外的字符串格式返回到内部,因此它将一行分成两行,如下面的第四行。 在这种情况下,我得到了错误:

  

期望14个字段,但找到12个。

读入文件时如何处理此类数据? 预先谢谢你。

数据

ID,DBID,OperationDate,TID,Type,EMail,ClientIPAddress,ClientType,Param1,Param2,Param3,Param4,Param5,Detail
619,1,2019-08-08 03:01:00.310,2300,101,a@example.com,3.10.226.203,C,639,0,0,NULL,NULL,ANULL
402,1,2019-08-08 02:50:51.300,2300,109,fa@example.com,3.10.226.203,C,639,0,0,NULL,NULL,NULL
395,1,2019-08-08 02:50:19.377,2300,101,a@example.com,3.10.226.203,C,6387,0,0,NULL,NULL,NULL
341,1,2019-08-08 01:46:21.390,2300,104,a@example.com,3.10.226.203,A,1352,23,234630,Here is an unexpected string
which has return,NULL,NULL
329,1,2019-08-08 01:45:52.673,2300,101,a@example.com,39.1.226.203,A,6411,0,0,NULL,NULL,NULL

1 个答案:

答案 0 :(得分:1)

您的原件出现此错误:

data

当我建议使用参数originalDataset <- fread("test.csv", encoding = "UTF-8", sep = ",", select = c("OperationDate","TID","Type","EMail","ClientType","Param4")) Warning message: In fread("test.csv", encoding = "UTF-8", sep = ",", select = c("OperationDate", : Stopped early on line 5. Expected 14 fields but found 12. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<341,1,2019-08-08 01:46:21.390,2300,104,a@example.com,3.10.226.203,A,1352,23,234630,Here is an unexpected >> 时,将正确读取数据:

fill=TRUE