我试图解析UTF-8文件,但R解析器无法在此符号之后读取字符串(我已经捕获了屏幕截图,因为它没有' t粘贴到浏览器)
d <- read.csv2('myfile.csv', header = FALSE, sep=",", quote="\"", numerals='no.loss', encoding="UTF-8", skipNul=TRUE)
tail(d)[,]
有没有办法从文件中删除这些符号?
UPD: vi将此符号显示为 ^ Z
UPD2:链接到示例文件https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?dl=0
答案 0 :(得分:-1)
如果遵循@cory提出的代码,我只会收到警告:
read.csv("https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?raw=1", encoding="UTF-8", skipNul=TRUE, header=FALSE)
## Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?raw=1'
但实际数据的加载方式与csv相同。 (看看罪魁祸首,一个\032
字符。)
所以这是一个避免警告的替代代码:
a <- readChar("https://www.dropbox.com/s/1kucjnia8ew1u5n/1.csv?raw=1",
useBytes = TRUE,
nchars = 1000)
b <- gsub("\\\032", " ", a)
new_a <- read.table(header = FALSE, text = b, sep = ",")