根据条件替换字符串中匹配的模式

时间:2020-03-02 12:32:22

标签: r regex

我有一个包含数字,字母和空格的文本字符串。它的某些子字符串是月份的缩写。我想执行基于条件的模式替换,即在且仅在满足给定条件的情况下,在空格内加上一个月的缩写。例如,使条件如下:“以数字开头并以字母开头”。

我尝试了stringr程序包,但是未能合并功能str_replace_all()str_locate_all()

# Input:
txt = "START1SEP2 1DECX JANEND"
# Desired output:
# "START1SEP2 1 DEC X JANEND"

# (A) What I could do without checking the condition:
library(stringr)
patt_month = paste("(", paste(toupper(month.abb), collapse = "|"), ")", sep='')
str_replace_all(string = txt, pattern = patt_month, replacement = " \\1 ")
# "START1 SEP 2 1 DEC X  JAN END"

# (B) But I actually only need replacements inside the condition-based bounds:
str_locate_all(string = txt, pattern = paste("[0-9]", patt_month, "[A-Z]", sep=''))[[1]]
#      start end
# [1,]    12  16

# To combine (A) and (B), I'm currently using an ugly for() loop not shown here and want to get rid of it

2 个答案:

答案 0 :(得分:5)

您正在寻找环顾四周:

(?<=\d)DEC(?=[A-Z])

请参见a demo on regex101.com


环顾四周功能可确保某个位置匹配而不消耗任何字符。它们在某事前可用。 (称为后向)或确保其后的任何内容均为特定类型(称为前瞻)。您在正反两面都有,因此有四种类型(正向/负向后/向前)。

简短的备忘录:

  • (?=...)是pos。前瞻
  • (?!...)是一个否定词。前瞻
  • (?<=...)是pos。向后看
  • (?<!...)是一个否定词。向后看

答案 1 :(得分:0)

基本R版本

patt_month <- capture.output(cat(toupper(month.abb),"|"))#concatenate all month.abb with OR  
pat <- paste0("(\\s\\d)(", patt_month, ")([A-Z]\\s)")#make it a three group thing 
gsub(pattern = pat, replacement = "\\1 \\2 \\3", txt, perl =TRUE)#same result as above

也可直接用于txt2 <- "START1SEP2 1JANY JANEND"

[1] "START1SEP2 1 JAN Y JANEND"