R编码错误 - XML UTF-8

时间:2015-09-14 19:37:16

标签: xml r encoding

编辑:根据Parfait的建议,我通过指定ISO-8859-1编码而不是UTF_8找到了成功。

我正在阅读IEEE文章元数据&摘要。

我正在遍历多个结果页面。我的代码一直运行良好,但是这一点导致了以下错误:

require(XML)
link <- "http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?py=1934&hc=100&rs=1"
doc <- xmlParse(link, encoding = "UTF_8", options = NOCDATA)

错误:

input conversion failed due to input error, bytes 0x20 0x62 0x65 0x66
encoder errorCData section not finished
Discussion on ¿The measurement of noise, with s
Premature end of data in tag title line 3081
Premature end of data in tag document line 3077
Premature end of data in tag root line 3
Error: 1: input conversion failed due to input error, bytes 0x20 0x62 0x65 0x66
2: encoder error3: CData section not finished
Discussion on ¿The measurement of noise, with s
4: Premature end of data in tag title line 3081
5: Premature end of data in tag document line 3077
6: Premature end of data in tag root line 3

我遇到了与此数据集相同的错误,但是通过一次读取较小的数据集(现在hc = 100而不是hc = 1000)成功解析了它。

此处列出了网关查询参数: http://ieeexplore.ieee.org/gateway/

为什么会出现这种错误以及我可以做些什么来解决它?

会话信息:

R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plyr_1.8.3   XML_3.98-1.3

loaded via a namespace (and not attached):
[1] slidify_0.4.5  markdown_0.7.7 tools_3.2.1    whisker_0.3-2  yaml_2.1.13    Rcpp_0.12.1   
[7] knitr_1.11     stringr_1.0.0 

感谢您的帮助!

0 个答案:

没有答案