我不知道为什么会出现这个错误?我试图以标题,链接,描述,日期和格式等格式解析新闻内容。使用xmlparse函数将其保存在数据框中,但它会抛出错误,如...
site = "http://www.federalreserve.gov/feeds/prates.xml"
doc <- tryCatch(xmlParse(site), error=function(e) e);
Unknown IO errorfailed to load external entity
"http://www.federalreserve.gov/feeds/prates.xml"
src <- xpathApply(xmlRoot(doc), "//item")
Error in UseMethod("xmlRoot") :no applicable method for 'xmlRoot'applied to an object of class "c('XMLParserErrorList', 'simpleError', 'error',
'condition')"
for (i in 1:length(src)) {
if (i==1) {
foo<-xmlSApply(src[[i]], xmlValue)
temp<-data.frame(t(foo), stringsAsFactors=FALSE)
DATA=data.frame(title=temp$title,link=temp$link,description=temp$description,pubDate=temp$pubDate)
}
else {
foo<-xmlSApply(src[[i]], xmlValue)
temp<-data.frame(t(foo), stringsAsFactors=FALSE)
temp1=data.frame(title=temp$title,link=temp$link,description=temp$description,pubDate=temp$pubDate)
DATA<-rbind(DATA, temp1)
}
}
Error: object 'src' not found
答案 0 :(得分:0)
该错误表示网址重定向到HTTPS,如我的评论中所述......
site <- "http://www.federalreserve.gov/feeds/prates.xml"
correct_site <- "https://www.federalreserve.gov/feeds/prates.xml"
curlGetHeaders(site)
[1] "HTTP/1.1 301 Moved Permanently\r\n"
[2] "Location: https://www.federalreserve.gov/feeds/prates.xml\r\n"
...
xmlParse(site)
Unknown IO errorfailed to load external entity "http://www.federalreserve.gov/feeds/prates.xml"
xmlParse
无法从https读取,因此请使用readLines(忽略警告)或xml2
包或许多其他选项从安全HTTP中读取。
xmlParse( correct_site)
Error: XML content does not seem to be XML: 'https://www.federalreserve.gov/feeds/prates.xml'
x <- readLines(correct_site)
Warning message:
In readLines(correct_site) :
incomplete final line found on 'https://www.federalreserve.gov/feeds/prates.xml'
xmlParse(x)
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:cb="http://www.cbwiki.net/wiki/index.php/Specification_1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/1999/02/22-rdf-syntax-ns# rdf.xsd">
<channel rdf:about="http://www.federalreserve.gov/feeds/">
<title>FRB: DDP: Policy Rates</title>
...
library(xml2)
read_xml( correct_site)
{xml_document}
<RDF schemaLocation="http://www.w3.org/1999/02/22-rdf-syntax-ns# rdf.xsd" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:cb="http://www.cbwiki.net/wiki/index.php/Specification_1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
[1] <channel rdf:about="http://www.federalreserve.gov/feeds/">\n <title>FRB: DDP: Policy Rates</title>\n ...
[2] <item rdf:about="http://www.federalreserve.gov/feeds/PRATES.html#1765">\n <title>Change to the Publica ...
[3] <item rdf:about="http://www.federalreserve.gov/feeds/PRATES.html#953">\n <title>Change to the Payment .