从坏词列表中创建审查功能

时间:2014-07-14 13:47:09

标签: python python-2.7

我试图创建一个审查字符串中单词的函数。它有点工作,有一些怪癖。

这是我的代码:

def censor(sentence):
    badwords = 'apple orange banana'.split()
    sentence = sentence.split()

    for i in badwords:
        for words in sentence:
            if i in words:
                pos = sentence.index(words)
                sentence.remove(words)
                sentence.insert(pos, '*' * len(i))

    print " ".join(sentence)

sentence = "you are an appletini and apple. new sentence: an orange is a banana. orange test."

censor(sentence)

输出:

you are an ***** and ***** new sentence: an ****** is a ****** ****** test.

一些标点符号消失了,"appletini"这个词被错误地替换了。

如何解决这个问题?

此外,有没有更简单的方法来做这种事情?

2 个答案:

答案 0 :(得分:2)

具体问题是:

  1. 你根本不考虑标点符号;和
  2. 插入'*'时,您使用"坏词"的长度,而不是单词。
  3. 我会切换循环顺序,因此您只需处理一次句子,并使用enumerate而不是removeinsert

    def censor(sentence):
        badwords = ("test", "word") # consider making this an argument too
        sentence = sentence.split()
    
        for index, word in enumerate(sentence):
            if any(badword in word for badword in badwords):
                sentence[index] = "".join(['*' if c.isalpha() else c for c in word])
    
        return " ".join(sentence) # return rather than print
    

    测试str.isalpha只会用星号替换大写和小写字母。演示:

    >>> censor("Censor these testing words, will you? Here's a test-case!")
    "Censor these ******* *****, will you? Here's a ****-****!"
                # ^ note length                         ^ note punctuation
    

答案 1 :(得分:0)

尝试:

for i in bad_word_list:
    sentence = sentence.replace(i, '*' * len(i))