我有一个对应于这样的问题的列表:
my_list = ["What", "language", "does", "the", "word", "«", "vibrato", "»", "come", "from", "?"]
我的程序检测到这个问题是否有否定(通过检测“not”,“do not” ......)。
问题在于,当它们处于引文中时,它也检测到这些单词,这是不希望的,例如,如果它是电影的名字。
如果我的句子中没有出现在引号之间的否定词,我该如何检测它?
示例:假设我的列表是:
my_list = ["who", "is", "not", "an", "animal", "?"]
这是一个否定的问题,但如果我有:
my_list = ["who", "is", "James Bond", "in", "the", "movie", "«", "kill", "is", "not", "a", "game", "»", "?"]
这不是一个否定的问题,因为唯一的否定就是引用。
目前,我检测否定的程序是:
for words in my_list:
for nword in negative_words:
if words == nword:
nega = True
my_list.remove(words)
答案 0 :(得分:1)
很高兴看到你改进了你的问题并重新打开了,所以我可以发布一个真正的答案:
你缺少的是一个标志,它会在解析报价被打开时告诉你 - 并在报价关闭后将其删除,以便你可以继续查看否定词。
在开发这种脚本后经常会发生什么,会遇到嵌套模式,这是事先未考虑的 - 但这不是问题,因为您可以轻松跟踪多个嵌套引号。现在,不要使用单个标志,而是通过将其添加到列表中来记住要关闭先前开始的引用的字符 - 并且只有当该列表为空时,尝试查找否定字。以下脚本的在线演示:https://repl.it/repls/GranularThunderousResources
# What are the negation matchers
notwords = ("not", "don't", "doesn't", )
# What are the quoting pairs (opener, closer)
# The following logic can handle nested quotes,
# so specify as many as you need without worrying
quotes = (("«", "»"), ("‹", "›"), ("<", ">"), )
# Needed for breaking out of outer loop when a
# starting quote was found
class StartingQuoteFound(Exception):
pass
def is_negated(sentence):
# Keep track of the expected quote closers
closing_quotes = []
for word in sentence:
# Check if the current word is a quote opener
try:
for quote in quotes:
if word == quote[0]:
# If found, remember that we await the quote
# closer before considering a word match
# to a notword
closing_quotes.append(quote[1])
raise StartingQuoteFound()
# Quote start was found, skip to the next word
except StartingQuoteFound:
continue
# If we are waiting for quotes>0 to be closed
if closing_quotes:
# And it is the expected quote closer
if closing_quotes[-1] == word:
# Remove it from the quote closer expectations
del closing_quotes[-1]
# And go to the next word
continue
# Check if the word is within notwords
# If found, we know that the sentence was negated
if word in notwords:
return True
# No negation found
return False
no_animal = ["who", "is", "not", "an", "animal", "?"]
print('expect negation:', is_negated(no_animal))
jon_is_kill = ["who", "is", "James Bond", "in", "the", "movie", "«", "kill", "is", "not", "a", "‹", "game", "›", "»", "?"]
print('not expect negation:', is_negated(jon_is_kill))
wat = ["James Bond", "in", "the", "movie", "«", "kill", "is", "not", "a", "‹", "game", "›", "»", "-", "doesn't", "drink", "alcohol"]
print('expect negation:', is_negated(wat))
在找到起始引用时使用Exception的说明:Python没有可用于中断/继续外部循环的标签,因此您需要抛出特定异常并在外部循环中捕获它,以便在遇到起始引用时,它将继续进行解析而无需进一步处理该引用开始。
答案 1 :(得分:0)
您可以在遇到开场报价时设置标记,并在遇到结束报价之前忽略所有后续字词:
flag_ignore = 0
negative_words = ["not", "don't"]
my_list = ["Do", "not", "say", "the", "word", "«", "don't", "»", "I", "don't", "like", "it"]
new_list = []
for word in my_list:
if not flag_ignore and any(word.lower()==n for n in negative_words):
pass
else:
new_list.append(word)
if word == "«":
flag_ignore = 1
elif word == "»":
flag_ignore = 0
print " ".join(new_list)
>>> "Do say the word « don't » I like it"