在R中我有一个带有列的数据框,其中每一行都有重复的文本,我想删除它与特定模式匹配:
x <- c("DOI: 10.5256/f1000research.6541.r7660 The revised article answers most of my remarks and questions in a ... Continue reading The revised article answers most of my remarks and questions in a satisfactory way.",
"DOI: 10.5256/f1000research.6601.r7701 The revision ... Continue reading The revision is approved I have read this",
"DOI: 10.5256/f1000research.6599.r7859 I have read the revised article by Horrell and D'Orazio. They have responded appropriately to ... Continue reading I have read the revised article by Horrell and D'Orazio. They have responded appropriately to the concerns/questions raised")
我可以使用什么功能删除... Continue reading
或Continue reading
之前的所有内容,包括... Continue reading
或Continue reading
?
答案 0 :(得分:1)
使用sub
包括继续阅读,
sub(".*Continue reading", "", x)
不包括继续阅读。
sub(".*(?=\\bContinue reading)", "", x, perl=TRUE)
或
sub(".*\\b(Continue reading)", "\\1", x)
答案 1 :(得分:1)
这应删除Continue reading
sub('.*\\.{3}\\s*(Continue reading.*)$', '\\1', x)
如果您需要在... Continue reading
sub('.*(\\.{3}\\s*Continue reading.*)$', '\\1', x)