Question

我正在使用正则表达式来替换一些子串。替换值重用了部分匹配。我希望不区分大小写，但在替换中，我想要一个匹配的小写版本。

library(stringi)
x <- "CatCATdog"
rx <- "(?i)(cat)(?-i)"
stri_replace_all_regex(x, rx, "{$1}")
# [1] "{Cat}{CAT}dog"

这接近我想要的，除了“猫”应该是小写。也就是说，输出字符串应为"{cat}{cat}dog"。

以下代码不起作用，但它显示了我的意图。

stri_replace_all_regex(x, rx, "{tolower($1)}")

以下技术确实有效，但它很难看，不是很普遍，也不是很有效。我的想法是用一个匹配我想要的正则表达式替换正则表达式，而不是替换值（即“cat”而不是“{cat}”）。然后在每个输入字符串中搜索第一个匹配项，找到匹配项的位置，执行子字符串替换，然后查找下一个匹配项，直到不再存在。太糟糕了。

x <- "CatCATdog"
rx <- "(?i)((?<!\\{)cat(?!\\}))(?-i)"
repeat{
  detected <- stri_detect_regex(x, rx)
  if(!any(detected))
  {
    break
  }
  index <- stri_locate_first_regex(x[detected], rx)
  match <- tolower(stri_match_first_regex(x[detected], rx)[, 2])
  stri_sub(x[detected], index[, 1], index[, 2]) <- paste0("{", match[detected], "}")
}

我觉得必须有更好的方法。

如何使用小写值替换不区分大小写的匹配项？

感谢评论的灵感，我发现我正在寻找的是“replacement text case conversion”。

Answer 1

如果您需要执行任何类型的字符串操作，可以使用gsubfn：

> library(gsubfn)
> rx <- "(?i)cat"
> s = "CatCATdog"
> gsubfn(rx, ~ paste0("{",tolower(x),"}"), s, backref=0)
[1] "{cat}{cat}dog"

您可以使用gsubfn，就像在JavaScript中使用String#replace内部的匿名回调方法一样（您可以指定使用function(args)捕获组的参数，并进行更复杂的操作内）。

Answer 2

您可以使用\\L将匹配的大小写更改为

gsub(rx, "{\\L\\1}", x, perl=TRUE)

匹配正则表达式不敏感，替换为特定情况

2 个答案: