R的XML包会在正确的XML文档上引发错误

时间:2013-12-17 08:43:17

标签: xml r cran

我应该使用XML软件包使用R软件解析许多XML文档(Duncan Temple Lang,2013)。以下是一个示例:http://musicbrainz.org/ws/2/release?query=%22A%20Is%20for%20Alpine%22%20AND%20artist:%22Alpine%22

如果链接被复制粘贴到浏览器的地址栏中,则会显示一个XML页面,并使用众多在线验证器之一检查其正确性。已选择http://validator.w3.org,XML文档的标记似乎有效。

但是使用此代码:

library(XML)
url = "http://musicbrainz.org/ws/2/release?query=%22A%20Is%20for%20Alpine%22%20AND%20artist:%22Alpine%22"
data = xmlTreeParse(url, asTree = TRUE)

报告了以下错误:

Blank needed here
Error: 1: Blank needed here

现在,错误类似于此处讨论的错误Validation problem with XML declaration,但无法看到错误如何应用于我要解析的XML文档。

软件: R版本3.0.2(2013-09-25) - “飞盘航行”

平台:x86_64-unknown-linux-gnu(64位)

XML包版本3.98-1.1

1 个答案:

答案 0 :(得分:1)

首先使用 RCurl 下载文件,然后您应该没有问题:

library(RCurl)
u <- getURL(url)

> xmlTreeParse(u, asTree=TRUE)
$doc
$file
[1] "<buffer>"

$version
[1] "1.0"

$children
$children$metadata
<metadata created="2013-12-17T04:49:41.807Z" xmlns="http://musicbrainz.org/ns/mmd-2.0#" xmlns:ext="http://musicbrainz.org/ns/ext#-2.0">
 <release-list count="1" offset="0">
  <release id="d1e75e7b-fe4a-4cd6-b0d9-8ccf04a62406" score="100">
   <title>A Is for Alpine by Alpine</title>
   <status>Official</status>
   <text-representation>
    <language>eng</language>
    <script>Latn</script>
   </text-representation>
   <artist-credit>
    <name-credit>
     <artist id="d7f0c2fe-00fb-4248-995a-dbfd5a87331a">
      <name>Alpine</name>
      <sort-name>Alpine</sort-name>
     </artist>
    </name-credit>
   </artist-credit>
   <release-group id="7ea67d40-8819-4059-a9be-e1115cdf0ddb" type="Album">
    <primary-type>Album</primary-type>
   </release-group>
   <date>2012-08-10</date>
   <country>AU</country>
   <release-event-list>
    <release-event>
     <date>2012-08-10</date>
     <area id="106e0bec-b638-3b37-b731-f53d507dc00e">
      <name>Australia</name>
      <sort-name>Australia</sort-name>
      <iso-3166-1-code-list>
       <iso-3166-1-code>AU</iso-3166-1-code>
      </iso-3166-1-code-list>
     </area>
    </release-event>
   </release-event-list>
   <label-info-list>
    <label-info>
     <catalog-number>IVY166</catalog-number>
     <label id="96e57a7b-c481-41e5-a0d4-111604210207">
      <name>Ivy League Records</name>
     </label>
    </label-info>
   </label-info-list>
   <medium-list count="1">
    <track-count>12</track-count>
    <medium>
     <format>CD</format>
     <disc-list count="1"/>
     <track-list count="12"/>
    </medium>
   </medium-list>
  </release>
 </release-list>
</metadata>


attr(,"class")
[1] "XMLDocumentContent"

$dtd
$external
NULL

$internal
NULL

attr(,"class")
[1] "DTDList"

attr(,"class")
[1] "XMLDocument"         "XMLAbstractDocument"