R:匹配字符串后删除文本结尾

时间:2015-06-21 23:49:16

标签: regex r gsub

我想删除某个字符与THE ENDFINIS匹配后显示的任何文字。我知道这与其他topic非常相似,但我对正则表达式的熟练程度不足以让我为此工作。

我的文字是从古腾堡计划中摘取的莎士比亚书籍。它们通常看起来像

txt <- "... thou hast tam'd a curst shrow.   LUCENTIO. 'Tis a wonder, 
  by your leave, she will be tam'd so. Exeunt  THE END   <<THIS ELECTRONIC  VERSION OF THE 
  COMPLETE WORKS OF WILLIAM ..."

txt <- "... thou hast tam'd a curst shrow.   LUCENTIO. 'Tis a wonder, 
  by your leave, she will be tam'd so. Exeunt  FINIS  <<THIS ELECTRONIC  VERSION OF THE 
  COMPLETE WORKS OF WILLIAM ..."

我的理想看起来像gsub("^[THE END]*|^[FINIS]*", "", txt)返回"... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt

1 个答案:

答案 0 :(得分:3)

你非常接近,你必须使用:

gsub("(THE END|FINIS).*", "", txt)

<强> Working demo

顺便说一句,thelatemailsub的评论中指出,只需一次替换即可。