我在正则表达式上相当新,我试图在两个字符串之间捕获文本。但如果字符串不存在,我想捕获所有文本。
以下是一个例子:
报告#1:观察:猫正在伸展。结论:这只猫很灵活。
我可以使用以下代码捕获" Observations"之间的文本。和" 结论":
(?:(?i)observations)(.*)(?:(?i)conclusions)
但如果另一个文字写道:
报告#1:观察:猫正在伸展。这只猫很灵活。
我想在" 观察"之后捕捉所有内容。
或者如果起始字符串" 观察"不存在:
报告#1:猫正在伸展。结论:这只猫很灵活。
我想从开头到结尾字符串捕获" 结论"。
我猜有条件的正则表达式可能有帮助吗?
谢谢!
答案 0 :(得分:1)
执行此操作的方法是使用两个正则表达式:
reports = c(
"Report #1: Observations: A cat is stretching. Conclusions: This cat is flexible.",
"Report #1: Observations: A cat is stretching. This cat is flexible.")
reports = sub(".*Observations:\\s*", "", reports)
reports = sub("\\s*Conclusions:.*", "", reports)
reports
[1] "A cat is stretching."
[2] "A cat is stretching. This cat is flexible."
答案 1 :(得分:1)
单行:
ex <- c(
"Report #1: Observations: A cat is stretching. Conclusions: This cat is flexible.",
"Report #1: Observations: A cat is stretching. This cat is flexible.",
"Report #1: A cat is stretching. Conclusions: This cat is flexible."
)
gsub("(^.*observations|conclusions.*$)", "", ex, ignore.case = TRUE)
# [1] ": A cat is stretching. "
# [2] ": A cat is stretching. This cat is flexible."
# [3] "Report #1: A cat is stretching. "
您可能需要边界或在单词之前/之后作为G5W建议空格。字边界看起来像这样,给出示例文本的输出相同:
gsub("(^.*\\bobservations\\b|\\bconclusions\\b.*$)", "", ex, ignore.case = TRUE)