Question

我正在解析字符串的自由文本，例如： “ ABC1：组织中存在染色”以识别存在/不存在，而与弦中的空白无关。

尽管尝试了许多方法，当前方法仍保持领先/落后。

test<-c("ABC1: staining present in tissue", "ABC1:  staining absent 
   in tissue", "ABC1:staining present  in tissue")

   unlist(regmatches(test, gregexpr("ABC1:\\s*staining\\s* (.*) \\s*in 
   tissue.*", test, perl=TRUE)))

目标输出可能是：当前不存在

Answer 1

由于您使用的是PCRE正则表达式，因此您可以使用基于环顾四周和\K的解决方案：

test<-c("ABC1: staining present in tissue", "ABC1:  staining absent 
   in tissue", "ABC1:staining present  in tissue")

unlist(regmatches(test, gregexpr("ABC1:\\s*staining\\s*\\K.*?(?=\\s*in\\s+tissue)", test, perl=TRUE)))
## => [1] "present" "absent"  "present"

或类似的stringr方法：

library(stringr)
str_match(test, "ABC1:\\s*staining\\s*(.*?)\\s*in\\s+tissue")[,2]
[1] "present" "absent"  "present"

请参见R demo online。

详细信息

ABC1:\\s*staining\\s*-将ABC1: staining与:末尾的任意0+空格匹配
\\K-匹配重置运算符，用于将匹配的文本从内存缓冲区中舍弃掉
.*?-除换行符以外的任何0+字符都应尽可能少（使用.*来匹配尽可能多的字符）
(?=\\s*in\\s+tissue)-正向超前，需要在当前位置的右边立即添加0+个空格，in，1+个空格和tissue。

Answer 2

更简单的方法：使用str_extract_all

> library(stringr)
> unlist(str_extract_all(test, "present|absent"))
[1] "present" "absent"  "present"

提取R列表中两个字符串之间的单词和/或字符串

2 个答案: