在R,gsub&正则表达式预测或lookbehind表达式删除字符串模式之前的所有内容?

时间:2015-11-16 07:20:11

标签: regex r gsub regex-lookarounds

在R中我有一个带有列的数据框,其中每一行都有重复的文本,我想删除它与特定模式匹配:

x <- c("DOI: 10.5256/f1000research.6541.r7660 The revised article answers most of my remarks and questions in a ... Continue reading The revised article answers most of my remarks and questions in a satisfactory way.", 
"DOI: 10.5256/f1000research.6601.r7701 The revision ... Continue reading The revision is approved I have read this", 
"DOI: 10.5256/f1000research.6599.r7859 I have read the revised article by Horrell and D'Orazio. They have responded appropriately to ... Continue reading I have read the revised article by Horrell and D'Orazio. They have responded appropriately to the concerns/questions raised")

我可以使用什么功能删除... Continue readingContinue reading之前的所有内容,包括... Continue readingContinue reading

2 个答案:

答案 0 :(得分:1)

使用sub

包括继续阅读,

sub(".*Continue reading", "", x)

不包括继续阅读。

sub(".*(?=\\bContinue reading)", "", x, perl=TRUE)

sub(".*\\b(Continue reading)", "\\1", x)

答案 1 :(得分:1)

这应删除Continue reading

之前的所有内容
sub('.*\\.{3}\\s*(Continue reading.*)$', '\\1', x)

如果您需要在... Continue reading

之前删除字符
sub('.*(\\.{3}\\s*Continue reading.*)$', '\\1', x)