我正在尝试使用fread读取表格。 txt文件的文本如下:
"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"
我使用的R代码是:dataset0 <- fread("data/test.txt", stringsAsFactors = F)
,其中包含development version data.table R包。
期望看到包含三列的数据集;但是:
Error in fread(input = "data/stackoverflow.txt", stringsAsFactors = FALSE) :
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>.
Consider setting 'comment.char=' if there is a trailing comment to be ignored.
如何解决?
答案 0 :(得分:6)
data.table的development version处理这样的文件,其中嵌入的引号尚未被转义。请参阅point 10 on the wiki page。
我刚刚对你的输入进行了测试,但它确实有效。
$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"
> DT = fread("unescaped.txt")
> DT
No Comment Type
1: 0 he said:"wonderful." A
2: 1 The problem is: reading table, and also "a problem, yes." keep going on. A
> ncol(DT)
[1] 3
答案 1 :(得分:2)
使用readLines
逐行阅读,然后替换分隔符和read.table
:
# read with no sep
x <- readLines("test.txt")
# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)
# read with new sep
read.table(text = x, sep = "|", header = TRUE)
# No Comment Type
# 1 0 he said:"wonderful." A
# 2 1 The problem is: reading table, and also "a problem, yes." keep going on. A