使用XPath在标识符中使用冒号检索节点和属性的属性

时间:2015-06-02 13:06:26

标签: xml r xpath

在R中使用XPathSapply,我正在尝试检索edgar:url属性中的url:

<edgar:xbrlFile edgar:sequence="3" edgar:file="edgr-2004_10k.xml" edgar:type="EX-100.INS" edgar:size="25257" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-2004_10k.xml" />

我尝试了以下几种变体:

url <- "http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2005-04.xml"
data <- getURL(url)
doc <- xmlParse(data)
url <- xpathSApply(doc, "//item/*[name()='edgar:xbrlFiling']", xmlValue)

以下是上述代码中所示网址的项目示例:

<item>
  <title>EDGAR ONLINE INC (0001080224) (Filer)</title>
      <link>http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/0001275287-05-001434-index.htm</link>
  <description>8-K</description>
  <pubDate>Mon, 25 Apr 2005 15:15:09 EDT</pubDate>
  <edgar:xbrlFiling xmlns:edgar="http://www.sec.gov/Archives/edgar">
    <edgar:companyName>EDGAR ONLINE INC</edgar:companyName>
    <edgar:formType>8-K</edgar:formType>
    <edgar:filingDate>04/25/2005</edgar:filingDate>
    <edgar:cikNumber>0001080224</edgar:cikNumber>
    <edgar:accessionNumber>0001275287-05-001434</edgar:accessionNumber>
    <edgar:fileNumber>001-32194</edgar:fileNumber>
    <edgar:acceptanceDatetime>20050425151509</edgar:acceptanceDatetime>
    <edgar:period>20050425</edgar:period>
    <edgar:assistantDirector>2 &amp; 3</edgar:assistantDirector>
    <edgar:assignedSic>7389</edgar:assignedSic>
    <edgar:fiscalYearEnd>1204</edgar:fiscalYearEnd>
    <edgar:xbrlFiles>
      <edgar:xbrlFile edgar:sequence="1" edgar:file="eo2425.txt" edgar:type="8-K" edgar:size="5282" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425.txt" />
      <edgar:xbrlFile edgar:sequence="2" edgar:file="eo2425ex991.txt" edgar:type="EX-99.1" edgar:size="4469" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425ex991.txt" />
      <edgar:xbrlFile edgar:sequence="3" edgar:file="edgr-2004_10k.xml" edgar:type="EX-100.INS" edgar:size="25257" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-2004_10k.xml" />
      <edgar:xbrlFile edgar:sequence="4" edgar:file="edgr-20050228.xsd" edgar:type="EX-100.SCH" edgar:size="12111" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228.xsd" />
      <edgar:xbrlFile edgar:sequence="5" edgar:file="edgr-20050228_cal.xml" edgar:type="EX-100.CAL" edgar:size="18069" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_cal.xml" />
      <edgar:xbrlFile edgar:sequence="6" edgar:file="edgr-20050228_lab.xml" edgar:type="EX-100.LAB" edgar:size="51434" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_lab.xml" />
      <edgar:xbrlFile edgar:sequence="7" edgar:file="edgr-20050228_pre.xml" edgar:type="EX-100.PRE" edgar:size="27275" edgar:description="" edgar:url="http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_pre.xml" />
    </edgar:xbrlFiles>
  </edgar:xbrlFiling>
</item>
<item>

1 个答案:

答案 0 :(得分:2)

使用XML并且使用xml2(暂时只能安装github)时,它非常简单。

XML

xpathSApply(doc, "//edgar:xbrlFile", xmlGetAttr, "edgar:url", namespaces="edgar")

xml2

library(xml2)
dat <- read_xml(url)

dat %>% 
  xml_find_all("//edgar:xbrlFile", ns=xml_ns(dat)) %>% 
  xml_attr("edgar:url", ns=xml_ns(dat))

两者都提供相同的结果:

##  [1] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425.txt"            
##  [2] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/eo2425ex991.txt"       
##  [3] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-2004_10k.xml"     
##  [4] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228.xsd"     
##  [5] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_cal.xml" 
##  [6] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_lab.xml" 
##  [7] "http://www.sec.gov/Archives/edgar/data/1080224/000127528705001434/edgr-20050228_pre.xml" 
##  [8] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/d8k.htm"                 
##  [9] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331.xml"       
## [10] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331.xsd"       
## [11] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331_cal.xml"   
## [12] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331_lab.xml"   
## [13] "http://www.sec.gov/Archives/edgar/data/29669/000119312505068717/xrrd-20050331_pre.xml"   
## [14] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050404_8kfinal.htm"
## [15] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20041231er.xml"      
## [16] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er.xsd"      
## [17] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er_pre.xml"  
## [18] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er_lab.xml"  
## [19] "http://www.sec.gov/Archives/edgar/data/13610/000095012305004029/bne-20050307er_cal.xml"