在R中使用XPathSapply,我正在尝试检索edgar:url属性中的url:
<edgar:xbrlFile edgar:sequence="3" edgar:file="edgr-2004_10k.xml" edgar:type="EX-100.INS" edgar:size="25257" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-2004_10k.xml" />
我尝试了以下几种变体:
url <- "http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2005-04.xml"
data <- getURL(url)
doc <- xmlParse(data)
url <- xpathSApply(doc, "//item/*[name()='edgar:xbrlFiling']", xmlValue)
以下是上述代码中所示网址的项目示例:
<item>
<title>EDGAR ONLINE INC (0001080224) (Filer)</title>
<link>http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/0001275287-05-001434-index.htm</link>
<description>8-K</description>
<pubDate>Mon, 25 Apr 2005 15:15:09 EDT</pubDate>
<edgar:xbrlFiling xmlns:edgar="http://www.sec.gov/Archives/edgar">
<edgar:companyName>EDGAR ONLINE INC</edgar:companyName>
<edgar:formType>8-K</edgar:formType>
<edgar:filingDate>04/25/2005</edgar:filingDate>
<edgar:cikNumber>0001080224</edgar:cikNumber>
<edgar:accessionNumber>0001275287-05-001434</edgar:accessionNumber>
<edgar:fileNumber>001-32194</edgar:fileNumber>
<edgar:acceptanceDatetime>20050425151509</edgar:acceptanceDatetime>
<edgar:period>20050425</edgar:period>
<edgar:assistantDirector>2 & 3</edgar:assistantDirector>
<edgar:assignedSic>7389</edgar:assignedSic>
<edgar:fiscalYearEnd>1204</edgar:fiscalYearEnd>
<edgar:xbrlFiles>
<edgar:xbrlFile edgar:sequence="1" edgar:file="eo2425.txt" edgar:type="8-K" edgar:size="5282" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425.txt" />
<edgar:xbrlFile edgar:sequence="2" edgar:file="eo2425ex991.txt" edgar:type="EX-99.1" edgar:size="4469" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425ex991.txt" />
<edgar:xbrlFile edgar:sequence="3" edgar:file="edgr-2004_10k.xml" edgar:type="EX-100.INS" edgar:size="25257" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-2004_10k.xml" />
<edgar:xbrlFile edgar:sequence="4" edgar:file="edgr-20050228.xsd" edgar:type="EX-100.SCH" edgar:size="12111" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228.xsd" />
<edgar:xbrlFile edgar:sequence="5" edgar:file="edgr-20050228_cal.xml" edgar:type="EX-100.CAL" edgar:size="18069" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_cal.xml" />
<edgar:xbrlFile edgar:sequence="6" edgar:file="edgr-20050228_lab.xml" edgar:type="EX-100.LAB" edgar:size="51434" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_lab.xml" />
<edgar:xbrlFile edgar:sequence="7" edgar:file="edgr-20050228_pre.xml" edgar:type="EX-100.PRE" edgar:size="27275" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_pre.xml" />
</edgar:xbrlFiles>
</edgar:xbrlFiling>
</item>
<item>
答案 0 :(得分:2)
使用XML
并且使用xml2
(暂时只能安装github)时,它非常简单。
XML
:
xpathSApply(doc, "//edgar:xbrlFile", xmlGetAttr, "edgar:url", namespaces="edgar")
xml2
:
library(xml2)
dat <- read_xml(url)
dat %>%
xml_find_all("//edgar:xbrlFile", ns=xml_ns(dat)) %>%
xml_attr("edgar:url", ns=xml_ns(dat))
两者都提供相同的结果:
## [1] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425.txt"
## [2] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425ex991.txt"
## [3] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-2004_10k.xml"
## [4] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228.xsd"
## [5] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_cal.xml"
## [6] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_lab.xml"
## [7] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_pre.xml"
## [8] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/d8k.htm"
## [9] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331.xml"
## [10] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331.xsd"
## [11] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331_cal.xml"
## [12] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331_lab.xml"
## [13] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331_pre.xml"
## [14] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050404_8kfinal.htm"
## [15] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20041231er.xml"
## [16] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er.xsd"
## [17] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er_pre.xml"
## [18] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er_lab.xml"
## [19] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er_cal.xml"