Question

我想使用gsub来消除列表中包含子词的所有单词。

words_to_eliminate = c("the", "of", "add", "is")
sentences = c("Other people are here", "This person is being offensive", "I'm addicted")
gsub(words_to_eliminate, "", sentences)

我想要什么

#" people are here", " person being ", "I'm "

谢谢

Answer 1

您可以使用paste0和collapse = "|"组合交替模式，但要抓住任意一侧的字母，您需要添加一些东西来抓住它们，例如\\w*（和任何空格\\s*）。

words_to_eliminate = c("the", "of", "add", "is")
sentences = c("Other people are here", "This person is being offensive", "I'm addicted")

paste0('\\s*\\w*', words_to_eliminate, '\\w*\\s*', collapse = '|')
#> [1] "\\s*\\w*the\\w*\\s*|\\s*\\w*of\\w*\\s*|\\s*\\w*add\\w*\\s*|\\s*\\w*is\\w*\\s*"

gsub(paste0('\\s*\\w*', words_to_eliminate, '\\w*\\s*', collapse = '|'), ' ', sentences)
#> [1] " people are here" " person being "   "I'm "

但是，模式是不必要的重复，并且可以通过组（捕获(...)或非捕获(?:...)在这里工作）显着缩短，尽管它实际上需要更多代码来构建该模式：

paste0('\\s*\\w*(', paste(words_to_eliminate, collapse = '|'), ')\\w*\\s*')
#> [1] "\\s*\\w*(the|of|add|is)\\w*\\s*"

gsub(paste0('\\s*\\w*(', paste(words_to_eliminate, collapse = '|'), ')\\w*\\s*'), ' ', sentences)
#> [1] " people are here" " person being "   "I'm "

Answer 2

如果我理解正确，那么你可以尝试这个，在另一个词中删除这些词：

gsub("(?>\\w*|\\s*)-(?>(\\w*|\\s*))","", gsub(paste0(words_to_eliminate,collapse="|"),"-",sentences) , perl=T)

<强>输出：

   > gsub("(?>\\w*|\\s*)-(?>(\\w*|\\s*))","", gsub(paste0(words_to_eliminate,collapse="|"),"-",sentences) , perl=T)
[1] " people are here" " person being "    "I'm "

Answer 3

对于import tkinter as tk from tkinter import * import requests root = tk.Tk() root.resizable(width=False, height=False) link = requests.get('https://talaikis.com/api/quotes/random/') RESPONSE = link.json()['quote'] RESPONSE2 = link.json()['author'] new = RESPONSE.split(" ") l = [] l.append(sum(len(s) for s in new[0:5])) l.append(sum(len(s) for s in new[5:10])) l.append(sum(len(s) for s in new[10:15])) l.append(sum(len(s) for s in new[15:20])) l.append(sum(len(s) for s in new[20:25])) l.append(sum(len(s) for s in new[25:30])) l.append(sum(len(s) for s in new[30:35])) l.append(sum(len(s) for s in new[35:40])) l.append(sum(len(s) for s in new[40:45])) l.append(sum(len(s) for s in new[45:50])) l.append(sum(len(s) for s in new[50:55])) x = list(set(l)) x.sort(reverse=True) message = Label(root, text = RESPONSE + "\n-" + RESPONSE2, height=round(len(new)/5), width = x[0]) message.pack(side = tk.BOTTOM) root.mainloop()中的每个字词，将words_to_eliminate添加到开头，将\<[a-z]*添加到结尾。试试这段代码：

[a-z]\>

R regex使用gsub替换包含特定单词/子字符串的整个单词

3 个答案: