Question

我目前正在尝试制作某种形式的审查员，以删除特定的单词。我目前正在处理这样一个想法，即用户将能够在其字母之间添加空格并绕过检查器。

一个例子：

Banned word: Apple
Solution: A p p l e

有没有办法在Regex中与之抗衡？我立即想到的是使用类似于以下内容的东西：

(a\s*p\s*p\s*l\s*e\s*)

但是我觉得这不是最佳解决方案。

如果对此有解决方案，请告诉我。谢谢。

编辑：

苹果实际上不是被禁止的单词，只是占位符，用于表示更多粗俗的单词。

删除空白然后比较的想法不可用，因为可以用这种方式标记一些无害的单词。例如：

"We need a medic, he's hit --> weneedamediche'[shit]" FLAGGED.

Answer 1

如果您输入的词带有空格，则您的正则表达式可以正常工作，但是strip方法可让您保留坏词的记录并与它们进行比较，而无需为每个坏词生成正则表达式。

s = "A p p l e"
s = s.trim() 
print(is_badword(s))

如果您输入的内容是文本，并且对其进行了解析以查找不良单词，那么由于我想您已经依靠空格来获取令牌，因此变得更加困难。

您将必须测试连续令牌的每种可能组合。使用正则表达式无法做到这一点，但是应该可以通过搜索树来查找O(t^2)中的坏词，其中t是令牌数。（此外，我想用户也可以用其他方式破坏坏词，例如Ap p le）

Answer 2

我希望这会有所帮助。

sentence = 'learn to play with code'
sentence_to_word_list = sentence.split(' ') # spliting sentence to words here
banned_Words = ['to', 'with']   # list of banned words

for index, word in enumerate(sentence_to_word_list): # enumerate is used to track the index of each word
    if word in banned_Words:
        sentence_to_word_list[index] = '-'.join(list(word)) # we can join word here with any character,symbol or number

sentence = ' '.join(sentence_to_word_list) # again joining the list of word to make the whole sentence
print(sentence) # output : learn t-o play w-i-t-h code

禁止词之间的间距-正则表达式Python

2 个答案: