(在R中)如何在保留缩写的情况下按标题大小写将“ WeLiveInCA”之类的字符串拆分为“ We Live In CA”?

时间:2019-04-26 18:31:35

标签: r regex string split

(在R中)如何在不拆分缩写的情况下按标题大小写将诸如“ WeLiveInCA”之类的字符串拆分为“ We Live In CA”?

我知道如何在每个大写字母处拆分字符串,但是这样做会拆分首字母缩写/缩写,例如CAUSSR甚至是U.S.A.,我需要保留它们。

所以我在考虑某种if a word in a string isn't an initialism then split the word with a space where a lowercase character is followed by an uppercase character之类的逻辑。

我下面的代码片段用大写字母用空格分隔单词,但它破坏了CA变成C A的首字母缩写。

s <- "WeLiveInCA"
trimws(gsub('([[:upper:]])', ' \\1', s))
# "We Live In C A"

或另一个示例...

s <- c("IDon'tEatKittensFYI", "YouKnowYourABCs")
trimws(gsub('([[:upper:]])', ' \\1', s))
# "I Don't Eat Kittens F Y I" "You Know Your A B Cs"

我想要的结果是:

"We Live In CA"
#
"I Don't Eat Kittens FYI" "You Know Your ABCs"

但这需要广泛适用(不仅仅是我的例子)

1 个答案:

答案 0 :(得分:2)

尝试使用基数R gregexpr/regmatches

s <- c("WeLiveInCA", "IDon'tEatKittensFYI", "YouKnowYourABCs")
regmatches(s, gregexpr('[[:upper:]]+[^[:upper:]]*', s))
#[[1]]
#[1] "We"   "Live" "In"   "CA"  
#
#[[2]]
#[1] "IDon't"  "Eat"     "Kittens" "FYI"    
#
#[[3]]
#[1] "You"  "Know" "Your" "ABCs"

说明。

  1. [[:upper:]]+匹配一个或多个大写字母;
  2. [^[:upper:]]*匹配零个或多个出现的除了大写字母之外的东西。
  3. 这两个正则表达式按顺序匹配以大写字母开头的单词,然后是其他字母。