R中的正则表达式:str_extract_all

时间:2018-06-06 17:33:07

标签: r regex stringr

我需要R中正则表达式的帮助。

library(stringr)
text <- "Detailed Description, {type:status-update,activityText:Closed,date:2018-06-01T12:00:15+0200,status:Closed}, {type:status-update,activityText:Inprogress,date:2018-06-01T12:00:15+0200,status:Inprogress}, Responsible:ABC"

str_extract_all(text, "status-update.a")

结果是:

[[1]]
[1] "status-update,a" "status-update,a"

以同样的方式输入以下代码

str_extract_all(text, "status-update[[:print:]]+}")

要获得以下内容:这意味着以下是我的预期输出

[[1]]
[1] "type:status-update,activityText:Closed,date:2018-06- 
01T12:00:15+0200,status:Closed" "type:status- 
update,activityText:Inprogress,date:2018-06- 
01T12:00:15+0200,status:Inprogress"

我只想提取大括号中的位,但我得到以下错误:

Error in stri_extract_all_regex(string, pattern, simplify = simplify,  : 
Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)

1 个答案:

答案 0 :(得分:5)

curly括号是常规表达语法的一部分,因此如果要提取它们,请将转义字符放在前面。

str_extract_all(text, "\\{.+?\\}")
#[[1]]
#[1] "{type:status-update,activityText:Closed,date:2018-06-01T12:00:15+0200,status:Closed}"        
#[2] "{type:status-update,activityText:Inprogress,date:2018-06-01T12:00:15+0200,status:Inprogress}"

要仅捕获{}中的文本,需要使用正则表达式的外观并查看头部选项。

 str_extract_all(text, "(?<=(\\{)).+?(?=\\})")

模式的含义:

(?<=   ) Look behind this match
\\{  look for the left curly bracket   
 .+   with at least 1 character (any character)      
 ?    do not perform a greedy match (without it will grab everything)    
\\}  to the right curly bracket
(?=   ) look head of match