当尝试获取超链接的子集时,R,gsub不起作用

时间:2016-04-06 11:40:10

标签: r gsub

我尝试运行如下代码。我想知道为什么gsub函数没有对这个输入起作用。任何人都知道为什么以及如何处理这种情况?

> text

[1] <a href="https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119 mt=8&uo=4" rel="nofollow">UberSocial for Twitter on iOS</a>
65 Levels: <a href="http://aktualpost.com" rel="nofollow">Aktualpost</a> ...
> start = as.numeric(regexpr(">",text)[[1]])+1
> start
[1] 103
> to_cut = substr(text,1,start-1)
> to_cut
[1] "<a href=\"https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119?mt=8&uo=4\" rel=\"nofollow\">"
> new_text = gsub(to_cut,"",as.character(text))
> new_text
[1] "<a href=\"https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119?mt=8&uo=4\" rel=\"nofollow\">UberSocial for Twitter on iOS</a>"

1 个答案:

答案 0 :(得分:1)

“to_cut”中有?在“text”中找不到。如果我们解决了这个问题,它应该可行,即检查“to_cut”中的?mt和“text”中的mt

gsub("^<a href=\"https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119 mt=8&uo=4\" rel=\"nofollow\">(.*)", "\\1", text)
#[1] "UberSocial for Twitter on iOS</a>"

目前尚不清楚OP如何通过?

获得“to_cut”
start = as.numeric(regexpr(">",text)[[1]])+1
to_cut <-substr(text,1,start-1)
to_cut
#[1] "<a href=\"https://itunes.apple.com/us/app/ubersocial-for-twitter/id396050119 mt=8&uo=4\" rel=\"nofollow\">"
gsub(to_cut, "", text)
#[1] "UberSocial for Twitter on iOS</a>"