我试图制作一个正则表达式来删除除了以外的所有内容:
我尝试使用Lookbehind ([^\\p{L} ']+
获取额外的空格(?<=\\s)\\s+
。每个都是孤立的:
gsub("(?<=\\s)\\s+", "", "I like 56 dogs that's him55.", perl = TRUE)
## [1] "I like 56 dogs that's him55."
gsub("[^\\p{L} ']+", "", "I like 56 dogs that's him55.", perl = TRUE)
## [1] "I like dogs that's him"
但是当我使用或(|
)来连接它们时:
gsub("((?<=\\s)\\s+)|([^\\p{L} ']+)", "", "I like 56 dogs that's him55.", perl = TRUE)
返回:
[1] "I like dogs that's him"
我希望删除多个额外空间(喜欢和狗之间),如:
[1] "I like dogs that's him"
如何使用一个正则表达式删除除字母,撇号和额外空格之外的所有内容?
答案 0 :(得分:2)
似乎问题来自你的正则表达式中的空间,它将每个数字转换为空格,代码对于我来说工作正常:
gsub("[^\\p{L}']+", " ", "I like 56 dogs that's him55.", perl = TRUE)
答案 1 :(得分:2)
如果您在一次通话中尝试执行此操作,则可以尝试以下操作:
gsub("[^\\pL' ]+\\h+(?=\\h)|\\h+(?=[^\\pL' ]+)|[^\\pL' ]+", "", x, perl=T)
# [1] "I like dogs that's him"
如果你想要更有效的IMO,这是你可以采用的另一种方法。
x <- "I like 56 dogs that's him55."
r <- gsub("[^\\pL' ]+", '', x, perl=T)
paste(strsplit(r, '\\s+')[[1]], collapse = ' ')
# [1] "I like dogs that's him"