Question

我需要删除各种短语中的某些单词，但由于单词可能是共轭，复数或占有，我只能查找前几个字母。一个例子：

example = "You are the elephant's friend."
gsub("\\beleph.*\\b", " _____ " , example)
[1] "You are the  _____ "

如何匹配前几个字母中的整个单词？

Answer 1

gsub("\\beleph[[:alpha:][:punct:]]+\\b", "_____" , example)
[1] "You are the _____ friend."

适用于此实例。

更改正在取代贪婪（有时是危险的）“。*”将所有正则表达式与字符类“[[：alpha：] [：punct：]] +”匹配，它匹配字母字符和标点字符。有关其他可能有用的现成字符类，请参阅help(regex)，例如[：alnum：]，以防任何字符串包含数字。

为了捕捉与第一个单词的匹配，以下内容应该有效。这是一个例子。

exampleYoda = "elephant's friend you be."

gsub("(\\b|^)eleph[[:alpha:][:punct:]]+\\b", "_____" , exampleYoda)
[1] "_____ friend you be."

也适用于示例

gsub("(\\b|^)eleph[[:alpha:][:punct:]]+\\b", "_____" , example)
[1] "You are the _____ friend."

Answer 2

要使原始代码正常工作，您只需要使量词不合理。

example = "You are the elephant's friend."
gsub("\\beleph.*?\\b", " _____ " , example)
[1] "You are the  _____ 's friend."

此解决方案会导致问题。但你可以使用空格插入，所以你可以尝试

example = "You are the elephant's friend."
gsub("\\seleph.*?\\s", " _____ " , example)
[1] "You are the _____ friend."