我在尝试解析R中的xml文件时一直收到错误。
以下是我要做的事情:
library(XML)
fileUrl <- "http://www.w3schools.com/xml/simple.xml"
doc <- xmlTreeParse(fileUrl, useInternal=TRUE)
我在下面收到以下错误: &#34;
Opening and ending tag mismatch: meta line 4 and head StartTag: invalid element name Opening and ending tag mismatch: br line 73 and p Opening and ending tag mismatch: br line 94 and body Opening and ending tag mismatch: br line 93 and html Premature end of data in tag br line 92 Premature end of data in tag br line 78 Premature end of data in tag br line 77 ... ... 16: Premature end of data in tag br line 64 17: Premature end of data in tag body line 63 18: Premature end of data in tag meta line 3 19: Premature end of data in tag head line 2 20: Premature end of data in tag html line 2
&#34;
我在Windows 7上使用R版本3.3.2.XML lib版本是3.98.1.5。
如果有人可以提供帮助,那会很高兴,因为这应该是一个简单的解析,但我被困在这里。
答案 0 :(得分:0)
使用包XML
,它适用于我,作为输出(dplyr
的输出为data.frame
):
library(XML)
library(dplyr)
fileUrl <- "http://www.w3schools.com/xml/simple.xml"
doc <- xmlTreeParse(fileUrl, useInternal=TRUE) %>%
xmlToDataFrame()
doc
name price description calories
1 Belgian Waffles $5.95 Two of our famous Belgian Waffles with plenty of real maple syrup 650
2 Strawberry Belgian Waffles $7.95 Light Belgian waffles covered with strawberries and whipped cream 900
3 Berry-Berry Belgian Waffles $8.95 Light Belgian waffles covered with an assortment of fresh berries and whipped cream 900
4 French Toast $4.50 Thick slices made from our homemade sourdough bread 600
5 Homestyle Breakfast $6.95 Two eggs, bacon or sausage, toast, and our ever-popular hash browns 950
另一种选择是尝试使用库xml2
:
library(xml2)
library(dplyr)
fileUrl <- "http://www.w3schools.com/xml/simple.xml"
# As a list
doc <-read_xml(fileUrl) %>%
as_list()