fread - 字符串中的多个分隔符

时间:2017-03-21 23:06:20

标签: r text data.table separator

我正在尝试使用fread读取表格。 txt文件的文本如下:

"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"

我使用的R代码是:dataset0 <- fread("data/test.txt", stringsAsFactors = F),其中包含development version data.table R包。

期望看到包含三列的数据集;但是:

Error in fread(input = "data/stackoverflow.txt", stringsAsFactors = FALSE) : 
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>. 
Consider setting 'comment.char=' if there is a trailing comment to be ignored.

如何解决?

2 个答案:

答案 0 :(得分:6)

data.table的development version处理这样的文件,其中嵌入的引号尚未被转义。请参阅point 10 on the wiki page

我刚刚对你的输入进行了测试,但它确实有效。

$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"

> DT = fread("unescaped.txt")
> DT
   No                                                                  Comment Type
1:  0                                                     he said:"wonderful."    A
2:  1 The problem is: reading table, and also "a problem, yes." keep going on.    A
> ncol(DT)
[1] 3

答案 1 :(得分:2)

使用readLines逐行阅读,然后替换分隔符和read.table

# read with no sep
x <- readLines("test.txt")

# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)

# read with new sep
read.table(text = x, sep = "|", header = TRUE)

#   No                                                                  Comment Type
# 1  0                                                     he said:"wonderful."    A
# 2  1 The problem is: reading table, and also "a problem, yes." keep going on.    A