拆分一个字符串,其中大写字母在stringr中跟随小写字母

时间:2015-02-21 22:34:36

标签: regex r stringr

我有一个看起来像这样的字符串向量,我想把它拆开:

str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque SauceFried Beef")

str_split(str, '[a-z][A-Z]', n = 3)

[[1]]
[1] "Fruit Loop"       "alapeno Sandwich"

[[2]]
[1] "Red Bagel"

[[3]]
[1] "Basil Lea"    "arbeque Sauc" "ried Beef"

但我需要将这些字母保留在最后和字母的开头。

2 个答案:

答案 0 :(得分:5)

这里有两种基础方法(如果需要,可以推广到 stringr )。

这个用一个占位符代替这个地方然后拆分它。

strsplit(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE")

## [[1]]
## [1] "Fruit Loops"       "Jalapeno Sandwich"
## 
## [[2]]
## [1] "Red Bagel"
## 
## [[3]]
## [1] "Basil Leaf"     "Barbeque Sauce" "Fried Beef"  

此方法使用前瞻和后视:

strsplit(str, "(?<=[a-z])(?=[A-Z])", perl=TRUE)

## [[1]]
## [1] "Fruit Loops"       "Jalapeno Sandwich"
## 
## [[2]]
## [1] "Red Bagel"
## 
## [[3]]
## [1] "Basil Leaf"     "Barbeque Sauce" "Fried Beef"  

编辑广义为 stringr ,以便您可以根据需要抓取3件

stringr::str_split(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE", 3)

答案 1 :(得分:3)

您也可以根据字符串匹配而不是split ting。

unlist(regmatches(str, gregexpr('[A-Z][a-z]+ [A-Z][a-z]+', str)))
# [1] "Fruit Loops"       "Jalapeno Sandwich" "Red Bagel"        
# [4] "Basil Leaf"        "Barbeque Sauce"    "Fried Beef"