正则表达式 - 在匹配之间捕获文本,如果不匹配,则捕获所有

时间:2017-07-18 21:07:16

标签: r regex

我在正则表达式上相当新,我试图在两个字符串之间捕获文本。但如果字符串不存在,我想捕获所有文本。

以下是一个例子:

  

报告#1:观察:猫正在伸展。结论:这只猫很灵活。

我可以使用以下代码捕获" Observations"之间的文本。和" 结论":

(?:(?i)observations)(.*)(?:(?i)conclusions)

但如果另一个文字写道:

  

报告#1:观察:猫正在伸展。这只猫很灵活。

我想在" 观察"之后捕捉所有内容。

或者如果起始字符串" 观察"不存在:

  

报告#1:猫正在伸展。结论:这只猫很灵活。

我想从开头到结尾字符串捕获" 结论"。

我猜有条件的正则表达式可能有帮助吗?

谢谢!

2 个答案:

答案 0 :(得分:1)

执行此操作的方法是使用两个正则表达式:

reports = c(
"Report #1: Observations: A cat is stretching. Conclusions: This cat is flexible.", 
"Report #1: Observations: A cat is stretching. This cat is flexible.")

reports = sub(".*Observations:\\s*", "", reports)
reports = sub("\\s*Conclusions:.*", "", reports)

reports
[1] "A cat is stretching."                      
[2] "A cat is stretching. This cat is flexible."

答案 1 :(得分:1)

单行:

ex <- c(
  "Report #1: Observations: A cat is stretching. Conclusions: This cat is flexible.",
  "Report #1: Observations: A cat is stretching. This cat is flexible.",
  "Report #1: A cat is stretching. Conclusions: This cat is flexible."
)

gsub("(^.*observations|conclusions.*$)", "", ex, ignore.case = TRUE)
# [1] ": A cat is stretching. "                     
# [2] ": A cat is stretching. This cat is flexible."
# [3] "Report #1: A cat is stretching. "            

您可能需要边界或在单词之前/之后作为G5W建议空格。字边界看起来像这样,给出示例文本的输出相同:

gsub("(^.*\\bobservations\\b|\\bconclusions\\b.*$)", "", ex, ignore.case = TRUE)