Question

我想从以下字符串中提取子字符串（描述详细信息）：

string1 <- @{self=https://somesite.atlassian.net/rest/api/2/status/1; description=The issue is open and ready for the assignee to start work on it.; iconUrl=https://somesite.atlassian.net/images/icons/statuses/open.png; name=Open; id=1; statusCategory=}
string2 <- @{self=https://somesite.atlassian.net/rest/api/2/status/10203; description=; iconUrl=https://somesite.atlassian.net/images/icons/statuses/generic.png; name=Full Curation; id=10203; statusCategory=}

我想获得以下

ExtractedSubString1 = "The issue is open and ready for the assignee to start work on it."
ExtractedSubString2 = ""

我试过了：

library(stringr)    
ExtractedSubString1 <- substr(string1, str_locate(string1, "description=")+12, str_locate(string1, "; iconUrl")-1)
ExtractedSubString2 <- substr(string2, str_locate(string2, "description=")+12, str_locate(string2, "; iconUrl")-1)

寻找更好的方法来实现这一目标。

Answer 1

仅使用基础R sub并进行反向引用，您可以

sub(".*description=(.*?);.*", "\\1", c(string1, string2))
[1] "The issue is open and ready for the assignee to start work on it." ""

".*"匹配任意一组字符，"description="是文字匹配，".*?"匹配任何字符集，但?强制执行惰性匹配而不是贪婪的比赛。 ";"是一个文字，"()"捕获懒惰匹配的子表达式。后引用"\\1"返回括号中捕获的子表达式。

使用基本R函数regexec和regmatches更接近OP中的方法。然后使用sapply "["来提取所需的结果。

sapply(regmatches(c(string1, string2),
                  regexec(".*description=(.*?);.*", c(string1, string2))),
       "[", 2)
[1] "The issue is open and ready for the assignee to start work on it." ""

Answer 2

你可以尝试：

test.1 <- gsub("description=", "", strsplit(string1, "; ")[[1]][2])

test.2 <- gsub("description=", "", strsplit(string2, "; ")[[1]][2])

这简单地将;上的字符串拆分为将每个字符串分成6个元素，方括号选择第2个元素，gsub将description=替换为空来删除它。

使用R提取子字符串

2 个答案: