说我有这样的文字:
pattern = "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')"
挑战在于如何使用
中的单词分隔符将其拆分为单词c(" ","-","/","\\","_",":","(",")",".",",")
家族。
期望的结果:
"This" "is" "some" "word" "expression" "I'd" "like" "to" "parse" "intelligently" "using" "special" "symbols" "like"
方法:
我可以使用
进行sapply
或for
循环
keywords = unlist(strsplit(pattern," "))
keywords = unlist(strsplit(keywords,"-"))
#etc。
问题:
但使用Reduce(f, x, init, accummulate=TRUE)
的解决方案是什么?
答案 0 :(得分:5)
你在这里不应该Reduce
。您应该可以执行以下操作:
splitters <- c(" ","/","\\","_",":","(",")",".",",","-") # dash should come last
pattern <- paste0("[", paste(splitters, collapse = ""), "]")
string <- "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')"
strsplit(string, pattern)[[1]]
# [1] "This" "is" "some" "word"
# [5] "expression" "I'd" "like" "to"
# [9] "parse" "intelligently" "using" "special"
# [13] "symbols" "like" "'" "'"
请注意,正则表达式字符类中的-
应该是第一个或最后一个,所以我已经编辑了&#34;分割器&#34;因此。此外,您可能希望在&#34;模式结束时添加+
&#34;如果您想要将多个空格折叠成一个。
答案 1 :(得分:4)
您可以使用选项perl = TRUE
然后拆分标点符号或空格
> strsplit(pattern, '[[:punct:]]|[[:space:]]', perl = TRUE)
[[1]]
[1] "This" "is" "some" "word" "expression"
[6] "I" "d" "like" "to" "parse"
[11] "intelligently" "using" "special" "symbols" "like"
[16] ""
答案 2 :(得分:2)
我一起去(它将"I'd"
保持在一起)
strsplit(pattern, "[^[:alnum:][:digit:]']")
## [[1]]
## [1] "This" "is" "some" "word" "expression" "I'd" "like" "to" "parse"
## [10] "intelligently" "using" "special" "symbols" "like" "'" "'"