Question

我有一个列表列表形式的数据集（a_list_of_sentences），其中较小的列表包含一个单词及其句法依存关系，并且这些列表连接成句子，如下所示：

[[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']],
 [['mary', 'nsubj'], ['loves', 'ROOT'], ['all', 'det'], ['men', 'dobj']],
 [['all', 'det'], ['students', 'nsubj'], ['love', 'ROOT'], ['mary', 'dobj']]]

我想找到句子中有一个量词（例如“ every”，“ all”），后跟一个其句法依赖性是主语（nsubj）或宾语（dobj）的单词，并在它们之间进行区分两种情况。出于我的目的，主题或宾语可以是量词后面的第一个单词，也可以是量词后面的第二个单词。我尝试通过这种方式使用enumerate（）来做到这一点：

for sentence in a_list_of_sentences:
    for i, j in enumerate(sentence):
            if "dobj" in sentence[i]:
                if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
                    print(sentence, "dobj")
            elif "nsubj" in sentence[i]:
                if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
                    print(sentence, "nsubj")

但是，由于我得到了类似[['mary'，'nsubj']，['loves'，'ROOT']，['在两个打印输出中的“ every”，“ det”]，['man'，'dobj']]：

[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] nsubj
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] dobj

您知道我在做错什么以及如何解决吗？

非常感谢！！

Answer 1

您可以在列表中使用负索引。以下示例将显示“ c”。

mylist = ['a', 'b', 'c']
print(mylist[-1])

因此，如果我们采用您的第一个论点：

[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']]

由于以下原因，它将首先在elif语句上的句子的第一个单词上打印：

mary是一个nsubj
和句子[i-2]产生句子[-2]，等于“每个”

现在，它还将打印在if语句的句子的最后一个单词上，因为：

man是dobj
和句子[i-1]一起产生句子[2]，等于“每个”

我建议您向前看而不是向后看，例如，使用以下代码：

quantifiers = ['every', 'all']
for sentence in a_list_of_sentences:
    max_index = len(sentence) - 1
    for word_index, word in enumerate(sentence):
        if word[0] in quantifiers:
            if max_index > word_index:
                if sentence[word_index+1][1] in 'nsubj':
                    print(sentence, "nsubj")
                elif sentence[word_index+1][1] in 'dobj':
                    print(sentence, "dobj")
            if max_index > word_index + 1:
                if sentence[word_index+2][1] in 'nsubj':
                    print(sentence, "nsubj")
                elif sentence[word_index+2][1] in 'dobj':
                    print(sentence, "dobj")

最后，我要谈一谈您如何使用索引。

在您的代码中，而不是：

for i, j in enumerate(sentence):
        if "dobj" in sentence[i]:

您可以这样做：

for i, j in enumerate(sentence):
        if "dobj" in j:

Answer 2

问题在于列表切片索引可能为负（如果不是，则将出现 IndexError ）。列表末尾的环绕有点。
检查[SO]: Understanding slice notation了解更多详细信息。
下面是一个更干净的变体。

code00.py ：

#!/usr/bin/env python3

import sys


def main(*argv):
    sentences = [
        [["mary", "nsubj"], ["loves", "ROOT"], ["every", "det"], ["man", "dobj"]],
        [["mary", "nsubj"], ["loves", "ROOT"], ["all", "det"], ["men", "dobj"]],
        [["all", "det"], ["students", "nsubj"], ["love", "ROOT"], ["mary", "dobj"]],
    ]
    quantifiers = ["all", "every"]
    syntactic_roles = ["nsubj", "dobj"]

    for sentence in sentences:
        #print(sentence)
        quantifier_idx = -1
        for idx, (word, syntactic_role) in enumerate(sentence):
            if quantifier_idx > -1 and idx - quantifier_idx in [1, 2] and syntactic_role in syntactic_roles:
                print(" ".join(item[0] for item in sentence) + " - " + syntactic_role)
                break
            if word in quantifiers:
                quantifier_idx = idx


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main(*sys.argv[1:])
    print("\nDone.")

输出：

e:\Work\Dev\StackOverflow\q059500488>"c:\Install\pc064\Python\Python\03.08.01\python.exe" code00.py
Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] 64bit on win32

mary loves every man - dobj
mary loves all men - dobj
all students love mary - nsubj

Done.

enumerate（）中的索引问题-Python

2 个答案: