R read.csv打破特殊符号

时间:2016-02-26 15:34:16

标签: r csv encoding

我试图解析UTF-8文件,但R解析器无法在此符号arrow symbol之后读取字符串(我已经捕获了屏幕截图,因为它没有' t粘贴到浏览器)

d <- read.csv2('myfile.csv', header = FALSE, sep=",", quote="\"", numerals='no.loss', encoding="UTF-8", skipNul=TRUE)
tail(d)[,]

screenshot from notepad

screenshot of tail(d)[,] in RStudio

有没有办法从文件中删除这些符号?

UPD: vi将此符号显示为 ^ Z
UPD2:链接到示例文件https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?dl=0

1 个答案:

答案 0 :(得分:-1)

如果遵循@cory提出的代码,我只会收到警告:

read.csv("https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?raw=1", encoding="UTF-8", skipNul=TRUE, header=FALSE)

## Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?raw=1'

但实际数据的加载方式与csv相同。 (看看罪魁祸首,一个\032字符。)

所以这是一个避免警告的替代代码:

a <- readChar("https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?raw=1",
               useBytes = TRUE, 
               nchars = 1000)
b <- gsub("\\\032", " ", a)
new_a <- read.table(header = FALSE, text = b, sep = ",")