Question

在R中我有一个带有列的数据框，其中每一行都有重复的文本，我想删除它与特定模式匹配：

x <- c("DOI: 10.5256/f1000research.6541.r7660 The revised article answers most of my remarks and questions in a ... Continue reading The revised article answers most of my remarks and questions in a satisfactory way.", 
"DOI: 10.5256/f1000research.6601.r7701 The revision ... Continue reading The revision is approved I have read this", 
"DOI: 10.5256/f1000research.6599.r7859 I have read the revised article by Horrell and D'Orazio. They have responded appropriately to ... Continue reading I have read the revised article by Horrell and D'Orazio. They have responded appropriately to the concerns/questions raised")

我可以使用什么功能删除... Continue reading或Continue reading之前的所有内容，包括... Continue reading或Continue reading？

Answer 1

使用sub

包括继续阅读，

sub(".*Continue reading", "", x)

不包括继续阅读。

sub(".*(?=\\bContinue reading)", "", x, perl=TRUE)

或

sub(".*\\b(Continue reading)", "\\1", x)

Answer 2

这应删除Continue reading

之前的所有内容

sub('.*\\.{3}\\s*(Continue reading.*)$', '\\1', x)

如果您需要在... Continue reading

之前删除字符

sub('.*(\\.{3}\\s*Continue reading.*)$', '\\1', x)

在R，gsub＆amp;正则表达式预测或lookbehind表达式删除字符串模式之前的所有内容？

2 个答案: