我已经问过相关问题HERE和HERE。我试图概括这些答案,但都失败了。
基本上我有一个字符串我想分成单词,数字和任何类型的标点符号,但是,我想保留撇号。这是我尝试过的,我非常接近(我认为):
x <- "Raptors don't like robots! I'd pay $500.00 to rid them."
strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)
## [[1]]
## [1] "Raptors" "don" "'" "t" "like" "robots" "!"
## [8] "" "I" "'" "d" "pay" "$" "500" "." "00" "to"
## [20] "rid" "them" "."
这就是我追求的目标:
## [[1]]
## [1] "Raptors" "don't" "like" "robots" "!" "" "I'd"
## [8] "pay" "$" "500" "." "00" "to" "rid" "them" "."
虽然我想要一个基本解决方案,但我希望看到其他解决方案(我确信有人有一个字符串解决方案),这使得这个问题对其他人更具普遍性。
注意: R有一个特定的正则表达式系统。你需要熟悉R才能回答这个问题。
答案 0 :(得分:5)
您可以使用否定前瞻(?!')
:
strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
# [1] "Raptors" "don't" "like" "robots" "!" "" "I'd" "pay" "$" "500" "." "00" "to" "rid" "them" "."