删除字符串中两个单词之间的数据

时间:2018-08-09 08:56:24

标签: r regex

我的数据如下:

x = "Unable to load the file //xxxx/yyy/abc.pdf onto the RAM" 

我需要在“文件”和“ onto”之间隔开数据,并需要输出类似这样的内容

"Unable to load the file onto the RAM" 

我尝试了rm_between软件包中的qdapRegex选项,但是当我尝试这样的操作时,这甚至会删除单词“ file”和“ onto”:

rm_between(x,"file","onto",replacement = "")

我找不到其他保留边界词的选项。

1 个答案:

答案 0 :(得分:4)

正则表达式(regex)和基本R函数gsub()可以完成此工作:

gsub("(?<=file).*(?=onto)", " ", x, perl = TRUE)
[1] "Unable to load the file onto the RAM"

我们使用的正则表达式技巧是积极 先行后向

替代方法:

gsub("(file).*(onto)", "\\1 \\2", x, perl = TRUE)
[1] "Unable to load the file onto the RAM"

要继续使用您一直使用的功能,一个简单的技巧是:

qdapRegex::rm_between(x, "file", "onto", replacement = "file onto")
[1] "Unable to load the file onto the RAM"

看看文档,还有一个论点就是不删除边界(标记),这导致了最简单的解决方案:

qdapRegex::rm_between(x, "file", "onto", replacement = " ", include.markers = FALSE)
[1] "Unable to load the file onto the RAM"