<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="itemWithRetweets" link="http://twitter.com/MEDClementz/statuses/1001775473305817090" id="1001775473305817090">
如何从上面的^
中仅获取链接和id所需的输出:
link
[1] http://twitter.com/MEDClementz/statuses/1001775473305817090
id
[1] 1001775473305817090
答案 0 :(得分:2)
使用xml解析器而不是使用正则表达式
会更好library(xml2)
x <- read_xml('<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="itemWithRetweets" link="http://twitter.com/MEDClementz/statuses/1001775473305817090" id="1001775473305817090"></item>')
xml_attr(x,"link")
xml_attr(x,"id")
结果:
> xml_attr(x,"link")
[1] "http://twitter.com/MEDClementz/statuses/1001775473305817090"
> xml_attr(x,"id")
[1] "1001775473305817090"
答案 1 :(得分:0)
以下是使用stringr
包的选项。
library(stringr)
# Create the example string
string <- '<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="itemWithRetweets" link="http://twitter.com/MEDClementz/statuses/1001775473305817090" id="1001775473305817090">'
# Split the string
string2 <- str_split(string, pattern = " ")[[1]]
# Get the link
link <- str_subset(string2, "link")
link2 <- str_extract(link, "http://.*[0-9]+")
link2
# [1] "http://twitter.com/MEDClementz/statuses/1001775473305817090"
# Get the id
id <- str_subset(string2, "id")
id2 <- str_extract(id, "[0-9]+")
id2
# [1] "1001775473305817090"