使用R提取子字符串

时间:2017-07-17 20:04:36

标签: r regex substring

我想从以下字符串中提取子字符串(描述详细信息):

string1 <- @{self=https://somesite.atlassian.net/rest/api/2/status/1; description=The issue is open and ready for the assignee to start work on it.; iconUrl=https://somesite.atlassian.net/images/icons/statuses/open.png; name=Open; id=1; statusCategory=}
string2 <- @{self=https://somesite.atlassian.net/rest/api/2/status/10203; description=; iconUrl=https://somesite.atlassian.net/images/icons/statuses/generic.png; name=Full Curation; id=10203; statusCategory=}

我想获得以下

ExtractedSubString1 = "The issue is open and ready for the assignee to start work on it."
ExtractedSubString2 = ""

我试过了:

library(stringr)    
ExtractedSubString1 <- substr(string1, str_locate(string1, "description=")+12, str_locate(string1, "; iconUrl")-1)
ExtractedSubString2 <- substr(string2, str_locate(string2, "description=")+12, str_locate(string2, "; iconUrl")-1)

寻找更好的方法来实现这一目标。

2 个答案:

答案 0 :(得分:2)

仅使用基础R sub并进行反向引用,您可以

sub(".*description=(.*?);.*", "\\1", c(string1, string2))
[1] "The issue is open and ready for the assignee to start work on it." ""

".*"匹配任意一组字符,"description="是文字匹配,".*?"匹配任何字符集,但?强制执行惰性匹配而不是贪婪的比赛。 ";"是一个文字,"()"捕获懒惰匹配的子表达式。后引用"\\1"返回括号中捕获的子表达式。

使用基本R函数regexecregmatches更接近OP中的方法。然后使用sapply "["来提取所需的结果。

sapply(regmatches(c(string1, string2),
                  regexec(".*description=(.*?);.*", c(string1, string2))),
       "[", 2)
[1] "The issue is open and ready for the assignee to start work on it." ""

答案 1 :(得分:1)

你可以尝试:

test.1 <- gsub("description=", "", strsplit(string1, "; ")[[1]][2])

test.2 <- gsub("description=", "", strsplit(string2, "; ")[[1]][2])

这简单地将;上的字符串拆分为将每个字符串分成6个元素,方括号选择第2个元素,gsub将description=替换为空来删除它。