假设我有这个测试字符串:
test.string <- c("This is just a <test> string. I'm trying to see, if a FN will remove certain things like </HTML tags>, periods; but not the one in ASP.net, for example.")
我想:
所以上面应该是:
c("This is just a string I'm trying to see if a FN will remove certain things like periods but not the one in ASP.net for example")
对于#1,我尝试过以下方法:
gsub("<.*?>", "", x, perl = FALSE)
这似乎工作正常。
对于#2,我认为它只是:
gsub("[:@$%&*:,;^():]", "", x, perl = FALSE)
哪个有效。
对于#3,我试过了:
gsub("+[:alpha:]?[.]+[:space:]", "", test.string, perl = FALSE)
但那不起作用......
关于我哪里出错的任何想法?我完全厌倦了RegExp,所以任何帮助都会非常感激!!
答案 0 :(得分:4)
根据您提供的输入和要删除的内容的规则,以下内容应该有效。
gsub('\\s*<.*?>|[:;,]|(?<=[a-zA-Z])\\.(?=\\s|$)', '', test.string, perl=T)
请参阅Working Demo
答案 1 :(得分:1)
试试这个:
test.string <- "There is a natural aristocracy among men. The grounds of this are virtue and talents. "
gsub("\\.\\s*", "", gsub("([a-zA-Z0-9]). ([A-Z])", "\\1 \\2", test.string))
# "There is a natural aristocracy among men The grounds of this are virtue and talents