我想删除某个字符与THE END
或FINIS
匹配后显示的任何文字。我知道这与其他topic非常相似,但我对正则表达式的熟练程度不足以让我为此工作。
我的文字是从古腾堡计划中摘取的莎士比亚书籍。它们通常看起来像
txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder,
by your leave, she will be tam'd so. Exeunt THE END <<THIS ELECTRONIC VERSION OF THE
COMPLETE WORKS OF WILLIAM ..."
或
txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder,
by your leave, she will be tam'd so. Exeunt FINIS <<THIS ELECTRONIC VERSION OF THE
COMPLETE WORKS OF WILLIAM ..."
我的理想看起来像gsub("^[THE END]*|^[FINIS]*", "", txt)
返回"... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt
答案 0 :(得分:3)
你非常接近,你必须使用:
gsub("(THE END|FINIS).*", "", txt)
<强> Working demo 强>
顺便说一句,thelatemail在sub
的评论中指出,只需一次替换即可。