我需要一个列表,其中每个元素都包含来自下面粘贴的XML数据的作者姓名的字符向量,例如:
[[1]]
"Giada De Laurentiis"
[[2]]
"J K. Rowling"
[[3]]
"James McGovern", "Giada De Laurentiis", ...
等。
我从这开始:
my_titles_nodeset <- xpathSApply(doc = my_dom, path = "//book")
我以为每本书都有一个单独的DOM,我想用每本书做这个(我在第三本书上展示操作,跳过apply
函数):
> (title <- my_titles_nodeset[[3]])
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
我似乎得到了我想要的东西 - 仅限第三本书。所以我想提取作者:
> (author_group <- xpathSApply(title, path = "//book/author", xmlValue))
但我又将所有书籍的所有作者都放在了一堆!见下文:
> (author_group <- xpathSApply(title, path = "//book/author", xmlValue))
[1] "Giada De Laurentiis" "J K. Rowling" "James McGovern"
[4] "Per Bothner" "Kurt Cagle" "James Linn"
[7] "Vaidyanathan Nagarajan" "Erik T. Ray"
这是我第一次使用XPATH,我只能在R中编码,请不要使用其他编程语言进行解释。
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
答案 0 :(得分:0)
您可以分两步检索作者信息。第一步是书籍水平,然后是作者。
listBooks <- xpathApply(my_dom, "//book", saveXML)
listAuthors <- lapply(listBooks, function(book) unlist(xpathSApply(xmlInternalTreeParse(book), "//author/text()", saveXML)))
listAuthors
[[1]]
[1] "Giada De Laurentiis"
[[2]]
[1] "J K. Rowling"
[[3]]
[1] "James McGovern" "Per Bothner" "Kurt Cagle" "James Linn" "Vaidyanathan Nagarajan"
[[4]]
[1] "Erik T. Ray"