(在R中)如何在不拆分缩写的情况下按标题大小写将诸如“ WeLiveInCA”之类的字符串拆分为“ We Live In CA”?
我知道如何在每个大写字母处拆分字符串,但是这样做会拆分首字母缩写/缩写,例如CA
或USSR
甚至是U.S.A.
,我需要保留它们。
所以我在考虑某种if a word in a string isn't an initialism then split the word with a space where a lowercase character is followed by an uppercase character
之类的逻辑。
我下面的代码片段用大写字母用空格分隔单词,但它破坏了CA
变成C A
的首字母缩写。
s <- "WeLiveInCA"
trimws(gsub('([[:upper:]])', ' \\1', s))
# "We Live In C A"
或另一个示例...
s <- c("IDon'tEatKittensFYI", "YouKnowYourABCs")
trimws(gsub('([[:upper:]])', ' \\1', s))
# "I Don't Eat Kittens F Y I" "You Know Your A B Cs"
我想要的结果是:
"We Live In CA"
#
"I Don't Eat Kittens FYI" "You Know Your ABCs"
但这需要广泛适用(不仅仅是我的例子)
答案 0 :(得分:2)
尝试使用基数R gregexpr/regmatches
。
s <- c("WeLiveInCA", "IDon'tEatKittensFYI", "YouKnowYourABCs")
regmatches(s, gregexpr('[[:upper:]]+[^[:upper:]]*', s))
#[[1]]
#[1] "We" "Live" "In" "CA"
#
#[[2]]
#[1] "IDon't" "Eat" "Kittens" "FYI"
#
#[[3]]
#[1] "You" "Know" "Your" "ABCs"
说明。
[[:upper:]]+
匹配一个或多个大写字母; [^[:upper:]]*
匹配零个或多个出现的除了大写字母之外的东西。