递归拆分字符串

时间:2014-09-02 10:13:16

标签: regex r string mapreduce

说我有这样的文字:

pattern = "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')"

挑战在于如何使用

中的单词分隔符将其拆分为单词
c(" ","-","/","\\","_",":","(",")",".",",")

家族。

期望的结果:

"This" "is" "some" "word" "expression" "I'd" "like" "to" "parse" "intelligently" "using" "special" "symbols" "like"

方法

我可以使用

进行sapplyfor循环
 keywords = unlist(strsplit(pattern," "))
 keywords = unlist(strsplit(keywords,"-"))

#etc。

问题:

但使用Reduce(f, x, init, accummulate=TRUE)的解决方案是什么?

3 个答案:

答案 0 :(得分:5)

你在这里不应该Reduce。您应该可以执行以下操作:

splitters <- c(" ","/","\\","_",":","(",")",".",",","-") # dash should come last
pattern <- paste0("[", paste(splitters, collapse = ""), "]")
string <- "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')"
strsplit(string, pattern)[[1]]
#  [1] "This"          "is"            "some"          "word"         
#  [5] "expression"    "I'd"           "like"          "to"           
#  [9] "parse"         "intelligently" "using"         "special"      
# [13] "symbols"       "like"          "'"             "'"  

请注意,正则表达式字符类中的-应该是第一个或最后一个,所以我已经编辑了&#34;分割器&#34;因此。此外,您可能希望在&#34;模式结束时添加+&#34;如果您想要将多个空格折叠成一个。

答案 1 :(得分:4)

您可以使用选项perl = TRUE然后拆分标点符号或空格

> strsplit(pattern, '[[:punct:]]|[[:space:]]', perl = TRUE)
[[1]]
 [1] "This"          "is"            "some"          "word"          "expression"   
 [6] "I"             "d"             "like"          "to"            "parse"        
[11] "intelligently" "using"         "special"       "symbols"       "like"         
[16] ""    

答案 2 :(得分:2)

我一起去(它将"I'd"保持在一起)

strsplit(pattern, "[^[:alnum:][:digit:]']")
## [[1]]
##  [1] "This"          "is"            "some"          "word"          "expression"    "I'd"           "like"          "to"            "parse"        
## [10] "intelligently" "using"         "special"       "symbols"       "like"          "'"             "'"