Question

我想使用gsub来获取字符串中至少有2个连续大写字母的尾部子字符串。也许在一些例子中更容易证明：

"my_BONNIE" -> "ONNIE"
"Billing_ID" -> "D"
"OPT" -> "PT"

这是我的尝试不起作用：

> gsub("[^(A-Z)][A-Z]([A-Z]+)", "\\1", "BillingTableIDE")
[1] "BillingTablDE"

编辑：

让这个适用于单一资本“subword”，如下所示：

gsub("([A-Z])([A-Z]+)", paste0("\\1", tolower(
     gsub(".*(?<![A-Z])[A-Z]([A-Z]+)", "\\1", x, perl = TRUE))), x)

但是，如果有多个大写“子词”，那么这不起作用：

# Does
"ABC_DEF" -> "Aef_Def"
# When it should be doing
"ABC_DEF" -> "Abc_Def"

Answer 1

如果要将整个字符串替换为部分字符串，则模式需要匹配整个字符串：

gsub(".*?[A-Z]([A-Z]+)$", "\\1", "BillingTableIDE")
# [1] "DE"

虽然您的模式只匹配：

BillingTableIDE
#          ^^^^  and this gets replaced with DE
# so you have BillingTablDE

Answer 2

使用stringr的解决方案。

library(stringr)

test <- c("my_BONNIE", "Billing_ID", "OPT")

test2 <- str_extract(test, "[A-Z]+$")

test3 <- str_sub(test2, 2)

test3
# [1] "ONNIE" "D"     "PT"

R Regex获得大写字母子串

2 个答案: