尝试解析R中的XML时出错

时间:2017-01-04 09:50:16

标签: r xml data-science

我在尝试解析R中的xml文件时一直收到错误。

以下是我要做的事情:

library(XML)
fileUrl <- "http://www.w3schools.com/xml/simple.xml"
doc <- xmlTreeParse(fileUrl, useInternal=TRUE)

我在下面收到以下错误: &#34;

Opening and ending tag mismatch: meta line 4 and head
StartTag: invalid element name
Opening and ending tag mismatch: br line 73 and p
Opening and ending tag mismatch: br line 94 and body
Opening and ending tag mismatch: br line 93 and html
Premature end of data in tag br line 92
Premature end of data in tag br line 78
Premature end of data in tag br line 77
...
...
16: Premature end of data in tag br line 64
17: Premature end of data in tag body line 63
18: Premature end of data in tag meta line 3
19: Premature end of data in tag head line 2
20: Premature end of data in tag html line 2

&#34;

我在Windows 7上使用R版本3.3.2.XML lib版本是3.98.1.5。

如果有人可以提供帮助,那会很高兴,因为这应该是一个简单的解析,但我被困在这里。

1 个答案:

答案 0 :(得分:0)

使用包XML,它适用于我,作为输出(dplyr的输出为data.frame):

library(XML)
library(dplyr)

fileUrl <- "http://www.w3schools.com/xml/simple.xml"
doc <- xmlTreeParse(fileUrl, useInternal=TRUE) %>%
  xmlToDataFrame()
doc

name price                                                                         description calories
1             Belgian Waffles $5.95                   Two of our famous Belgian Waffles with plenty of real maple syrup      650
2  Strawberry Belgian Waffles $7.95                   Light Belgian waffles covered with strawberries and whipped cream      900
3 Berry-Berry Belgian Waffles $8.95 Light Belgian waffles covered with an assortment of fresh berries and whipped cream      900
4                French Toast $4.50                                 Thick slices made from our homemade sourdough bread      600
5         Homestyle Breakfast $6.95                 Two eggs, bacon or sausage, toast, and our ever-popular hash browns      950

另一种选择是尝试使用库xml2

library(xml2)
library(dplyr)

fileUrl <- "http://www.w3schools.com/xml/simple.xml"

# As a list
doc <-read_xml(fileUrl) %>% 
  as_list()