[1] "<item xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:type=\"itemWithRetweets\" link=\"http://twitter.com/charliekirk11/statuses/1002221842894012416\" id=\"1002221842894012416\">\n <author>Charlie Kirk</author>\n <date>2018-05-31T12:14:42-04:00</date>\n <attachments/>\n <estimated_retweets>30</estimated_retweets>\n <screenName>charliekirk11</screenName>\n <avatarUrl>http://pbs.twimg.com/profile_images/993982887635685377/4CEEsYDS_normal.jpg</avatarUrl>\n <language>en</language>\n <location>\n <country>US</country>\n <locationString>Chicago, Illinois</locationString>\n </location>\n</item>"
[2] "<item xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:type=\"itemWithRetweets\" link=\"http://twitter.com/charliekirk11/statuses/1002221842894012416\" id=\"1002221842894012416\">\n <author>Charlie Kirk</author>\n <date>2018-05-31T12:14:42-04:00</date>\n <attachments/>\n <estimated_retweets>30</estimated_retweets>\n <screenName>charliekirk11</screenName>\n <avatarUrl>http://pbs.twimg.com/profile_images/993982887635685377/4CEEsYDS_normal.jpg</avatarUrl>\n <language>en</language>\n <location>\n <country>US</country>\n <locationString>Chicago, Illinois</locationString>\n </location>\n</item>"
[3] "<item xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:type=\"itemWithRetweets\" link=\"http://twitter.com/charliekirk11/statuses/1002221842894012416\" id=\"1002221842894012416\">\n <author>Charlie Kirk</author>\n <date>2018-05-31T12:14:42-04:00</date>\n <attachments/>\n <estimated_retweets>30</estimated_retweets>\n <screenName>charliekirk11</screenName>\n <avatarUrl>http://pbs.twimg.com/profile_images/993982887635685377/4CEEsYDS_normal.jpg</avatarUrl>\n <language>en</language>\n <location>\n <country>US</country>\n <locationString>Chicago, Illinois</locationString>\n </location>\n</item>"
我有一个网址列表,只想抓取“链接”和“ID”。我尝试过xml_attr(x,“link”和xml_attr(x,“id”),但它似乎不适用于列表。
答案 0 :(得分:0)
一种解决方案是使用来自tidyverse的purrr
包。
基本上,map
函数允许您通过列表迭代,应用函数并返回列表。在这里,我解析了xml并整理了一个tibble(data.frame like)格式的值。 map_dfr
将通过行绑定直接将结果列表转换为表格格式。
xml_text <- list("<item xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:type=\"itemWithRetweets\" link=\"http://twitter.com/charliekirk11/statuses/1002221842894012416\" id=\"1002221842894012416\">\n <author>Charlie Kirk</author>\n <date>2018-05-31T12:14:42-04:00</date>\n <attachments/>\n <estimated_retweets>30</estimated_retweets>\n <screenName>charliekirk11</screenName>\n <avatarUrl>http://pbs.twimg.com/profile_images/993982887635685377/4CEEsYDS_normal.jpg</avatarUrl>\n <language>en</language>\n <location>\n <country>US</country>\n <locationString>Chicago, Illinois</locationString>\n </location>\n</item>",
"<item xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:type=\"itemWithRetweets\" link=\"http://twitter.com/charliekirk11/statuses/1002221842894012416\" id=\"1002221842894012416\">\n <author>Charlie Kirk</author>\n <date>2018-05-31T12:14:42-04:00</date>\n <attachments/>\n <estimated_retweets>30</estimated_retweets>\n <screenName>charliekirk11</screenName>\n <avatarUrl>http://pbs.twimg.com/profile_images/993982887635685377/4CEEsYDS_normal.jpg</avatarUrl>\n <language>en</language>\n <location>\n <country>US</country>\n <locationString>Chicago, Illinois</locationString>\n </location>\n</item>",
"<item xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:type=\"itemWithRetweets\" link=\"http://twitter.com/charliekirk11/statuses/1002221842894012416\" id=\"1002221842894012416\">\n <author>Charlie Kirk</author>\n <date>2018-05-31T12:14:42-04:00</date>\n <attachments/>\n <estimated_retweets>30</estimated_retweets>\n <screenName>charliekirk11</screenName>\n <avatarUrl>http://pbs.twimg.com/profile_images/993982887635685377/4CEEsYDS_normal.jpg</avatarUrl>\n <language>en</language>\n <location>\n <country>US</country>\n <locationString>Chicago, Illinois</locationString>\n </location>\n</item>")
library(xml2)
library(tibble)
library(purrr)
xml_text %>%
map_dfr(~ {
content <- read_xml(.x)
tibble(
link = xml_attr(content, "link"),
id = xml_attr(content, "id")
)
})
#> # A tibble: 3 x 2
#> link id
#> <chr> <chr>
#> 1 http://twitter.com/charliekirk11/statuses/1002221842894012416 100222184~
#> 2 http://twitter.com/charliekirk11/statuses/1002221842894012416 100222184~
#> 3 http://twitter.com/charliekirk11/statuses/1002221842894012416 100222184~
由reprex package(v0.2.0)创建于2018-06-06。