Question

我试图识别字符串句子中标识的特定单词（来自列表）。

我设法导入一个（不合适的）单词列表，然后将其与输入句子进行比较，以查看该单词是否在句子中（用于基本的if循环） - 它运作良好（下面的代码）），但现在我需要确定哪个词实际上被用作输出的一部分。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from inappropriate_words import inappropriate # a list of inappropriate words
import sys

message = ' '.join(sys.argv[1:]) # the input message already converted to lowercase
message = message.replace(".", "") # to remove the full stop as well
#print (message) #to test if needed

if any(word in message.split() for word in inappropriate):
    print "SAMPLE WORD is inappropriate."

一个例子是：
输入：＆＃34;你喜欢cookies＆＃34;
过程：Cookie位于不适当的列表中，因此它被识别并且if循环触发了输出：＆＃34; Cookies不合适。＆＃34; ＃我喜欢饼干SBTW

Answer 1

我会使用一个集来存储不恰当的单词然后只使用列表进行O(1)而不是O(n)的查找：

st = set(inappropriate)
message = ' '.join(sys.argv[1:]) # the input message already converted to lowercase
message = message.replace(".", "") # to remove the full stop as well

for word in message.split():
    if word in st:
        print "{} is inappropriate.".format(word)

如果您想查看是否有任何单词匹配，请添加中断，以查看所有匹配的单词是否按原样使用。

您还可以使用set.intersection查找所有常用词：

comm = st.intersection(message.split())

最后，您可以删除标点符号并使用argv[1:]来代替加入和替换;

from string import punctuation

from inappropriate_words import inappropriate # a list of     inappropriate words
import sys

for word in sys.argv[1:]:
    if word.strip(punctuation) in st:
        print "{} is inappropriate.".format(word)

输出在字符串中找到的特定单词（从列表中）

1 个答案: