如何分割以下字符串?
“Wes Anderson - 布达佩斯大酒店理查德林克莱特 - BoyhoodBennett Miller - FoxcatcherMorten Tyldum - 模仿游戏”
成:
"Wes Anderson – The Grand Budapest Hotel"
"Richard Linklater – Boyhood"
"Bennett Miller – Foxcatcher"
"Morten Tyldum – The Imitation Game"
第一个分裂点是“HotelRichard”,所以我认为包含[a-z] [A-Z]的单词可用于查找规则。但如果我用这些部分代替那些部分:
strsplit("HotelRichard", "[a-z][A-Z]") returns "Hote" "ichard".
有什么好主意吗?
答案 0 :(得分:3)
您可以尝试使用此代码,我正在使用一种解决方法来插入§
符号(希望,如果您的输入完全没有那么频繁),然后将其拆分:
x <- "Wes Anderson – The Grand Budapest HotelRichard Linklater – BoyhoodBennett Miller – FoxcatcherMorten Tyldum – The Imitation Game"
x <- gsub("([a-z])([A-Z])","\\1§\\2",x)
strsplit(x,"§")
[[1]]
[1] "Wes Anderson \342\200\223 The Grand Budapest Hotel"
[2] "Richard Linklater \342\200\223 Boyhood"
[3] "Bennett Miller \342\200\223 Foxcatcher"
[4] "Morten Tyldum \342\200\223 The Imitation Game"
答案 1 :(得分:0)
首先拆分导演/电影混搭,然后将字符串拆分为插入的&#34; xxx&#34;。第一步标记两个组,然后用它们之间的三个x替换它们。
text <- "Wes Anderson – The Grand Budapest HotelRichard Linklater – BoyhoodBennett Miller – FoxcatcherMorten Tyldum – The Imitation Game"
text.split <- str_replace_all(text, "([a-z])([A-Z])", "\\1xxx\\2")
text.final <- str_split(text.split, "xxx")
text.final
[[1]]
[1] "Wes Anderson – The Grand Budapest Hotel" "Richard Linklater – Boyhood"
[3] "Bennett Miller – Foxcatcher" "Morten Tyldum – The Imitation Game"
答案 2 :(得分:0)
这是使用单个正则表达式(Lookahead和Lookbehind)的方法:
strsplit(x, "(?<=[a-z])(?=[A-Z])", perl = TRUE)
## [[1]]
## [1] "Wes Anderson – The Grand Budapest Hotel"
## [2] "Richard Linklater – Boyhood"
## [3] "Bennett Miller – Foxcatcher"
## [4] "Morten Tyldum – The Imitation Game"