检测列表中引号中的单词

时间:2018-06-13 13:56:05

标签: python

我有一个对应于这样的问题的列表:

my_list = ["What", "language", "does", "the", "word", "«", "vibrato", "»", "come", "from", "?"]

我的程序检测到这个问题是否有否定(通过检测“not”,“do not” ......)。

问题在于,当它们处于引文中时,它也检测到这些单词,这是不希望的,例如,如果它是电影的名字。

如果我的句子中没有出现在引号之间的否定词,我该如何检测它?

示例:假设我的列表是:

my_list = ["who", "is", "not", "an", "animal", "?"]

这是一个否定的问题,但如果我有:

my_list = ["who", "is", "James Bond", "in", "the", "movie", "«", "kill", "is", "not", "a", "game", "»", "?"]

这不是一个否定的问题,因为唯一的否定就是引用。

目前,我检测否定的程序是:

for words in my_list:
    for nword in negative_words:
        if words == nword:
            nega = True
            my_list.remove(words)

2 个答案:

答案 0 :(得分:1)

很高兴看到你改进了你的问题并重新打开了,所以我可以发布一个真正的答案:

你缺少的是一个标志,它会在解析报价被打开时告诉你 - 并在报价关闭后将其删除,以便你可以继续查看否定词。

在开发这种脚本后经常会发生什么,会遇到嵌套模式,这是事先未考虑的 - 但这不是问题,因为您可以轻松跟踪多个嵌套引号。现在,不要使用单个标志,而是通过将其添加到列表中来记住要关闭先前开始的引用的字符 - 并且只有当该列表为空时,尝试查找否定字。以下脚本的在线演示:https://repl.it/repls/GranularThunderousResources

# What are the negation matchers
notwords = ("not", "don't", "doesn't", )

# What are the quoting pairs (opener, closer)
# The following logic can handle nested quotes, 
# so specify as many as you need without worrying
quotes = (("«", "»"), ("‹", "›"), ("<", ">"), )


# Needed for breaking out of outer loop when a
# starting quote was found
class StartingQuoteFound(Exception):
    pass


def is_negated(sentence):
    # Keep track of the expected quote closers
    closing_quotes = []
    for word in sentence:
        # Check if the current word is a quote opener
        try:
            for quote in quotes:
                if word == quote[0]:
                    # If found, remember that we await the quote 
                    # closer before considering a word match
                    # to a notword
                    closing_quotes.append(quote[1])
                    raise StartingQuoteFound()
        # Quote start was found, skip to the next word
        except StartingQuoteFound:
            continue

        # If we are waiting for quotes>0 to be closed
        if closing_quotes:
            # And it is the expected quote closer
            if closing_quotes[-1] == word:
                # Remove it from the quote closer expectations
                del closing_quotes[-1]
            # And go to the next word
            continue

        # Check if the word is within notwords
        # If found, we know that the sentence was negated
        if word in notwords:
            return True

    # No negation found
    return False

no_animal = ["who", "is", "not", "an", "animal", "?"]
print('expect negation:', is_negated(no_animal))

jon_is_kill = ["who", "is", "James Bond", "in", "the", "movie", "«", "kill", "is", "not", "a", "‹", "game", "›", "»", "?"]
print('not expect negation:', is_negated(jon_is_kill))

wat = ["James Bond", "in", "the", "movie", "«", "kill", "is", "not", "a", "‹", "game", "›", "»", "-", "doesn't", "drink", "alcohol"]
print('expect negation:', is_negated(wat))

在找到起始引用时使用Exception的说明:Python没有可用于中断/继续外部循环的标签,因此您需要抛出特定异常并在外部循环中捕获它,以便在遇到起始引用时,它将继续进行解析而无需进一步处理该引用开始。

答案 1 :(得分:0)

您可以在遇到开场报价时设置标记,并在遇到结束报价之前忽略所有后续字词:

flag_ignore = 0
negative_words = ["not", "don't"]
my_list = ["Do", "not", "say", "the", "word", "«", "don't", "»", "I", "don't", "like", "it"]
new_list = []

for word in my_list:
    if not flag_ignore and any(word.lower()==n for n in negative_words):
        pass
    else:
        new_list.append(word)

    if word == "«":
        flag_ignore = 1
    elif word == "»":
        flag_ignore = 0

print " ".join(new_list)
>>> "Do say the word « don't » I like it"