我想从以下字符串中提取子字符串(描述详细信息):
string1 <- @{self=https://somesite.atlassian.net/rest/api/2/status/1; description=The issue is open and ready for the assignee to start work on it.; iconUrl=https://somesite.atlassian.net/images/icons/statuses/open.png; name=Open; id=1; statusCategory=}
string2 <- @{self=https://somesite.atlassian.net/rest/api/2/status/10203; description=; iconUrl=https://somesite.atlassian.net/images/icons/statuses/generic.png; name=Full Curation; id=10203; statusCategory=}
我想获得以下
ExtractedSubString1 = "The issue is open and ready for the assignee to start work on it."
ExtractedSubString2 = ""
我试过了:
library(stringr)
ExtractedSubString1 <- substr(string1, str_locate(string1, "description=")+12, str_locate(string1, "; iconUrl")-1)
ExtractedSubString2 <- substr(string2, str_locate(string2, "description=")+12, str_locate(string2, "; iconUrl")-1)
寻找更好的方法来实现这一目标。
答案 0 :(得分:2)
仅使用基础R sub
并进行反向引用,您可以
sub(".*description=(.*?);.*", "\\1", c(string1, string2))
[1] "The issue is open and ready for the assignee to start work on it." ""
".*"
匹配任意一组字符,"description="
是文字匹配,".*?"
匹配任何字符集,但?
强制执行惰性匹配而不是贪婪的比赛。 ";"
是一个文字,"()"
捕获懒惰匹配的子表达式。后引用"\\1"
返回括号中捕获的子表达式。
使用基本R函数regexec
和regmatches
更接近OP中的方法。然后使用sapply
"["
来提取所需的结果。
sapply(regmatches(c(string1, string2),
regexec(".*description=(.*?);.*", c(string1, string2))),
"[", 2)
[1] "The issue is open and ready for the assignee to start work on it." ""
答案 1 :(得分:1)
你可以尝试:
test.1 <- gsub("description=", "", strsplit(string1, "; ")[[1]][2])
test.2 <- gsub("description=", "", strsplit(string2, "; ")[[1]][2])
这简单地将;
上的字符串拆分为将每个字符串分成6个元素,方括号选择第2个元素,gsub将description=
替换为空来删除它。