enumerate()中的索引问题-Python

时间:2019-12-27 11:59:31

标签: python nlp enumerate

我有一个列表列表形式的数据集(a_list_of_sentences),其中较小的列表包含一个单词及其句法依存关系,并且这些列表连接成句子,如下所示:

[[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']],
 [['mary', 'nsubj'], ['loves', 'ROOT'], ['all', 'det'], ['men', 'dobj']],
 [['all', 'det'], ['students', 'nsubj'], ['love', 'ROOT'], ['mary', 'dobj']]]

我想找到句子中有一个量词(例如“ every”,“ all”),后跟一个其句法依赖性是主语(nsubj)或宾语(dobj)的单词,并在它们之间进行区分两种情况。出于我的目的,主题或宾语可以是量词后面的第一个单词,也可以是量词后面的第二个单词。我尝试通过这种方式使用enumerate()来做到这一点:

for sentence in a_list_of_sentences:
    for i, j in enumerate(sentence):
            if "dobj" in sentence[i]:
                if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
                    print(sentence, "dobj")
            elif "nsubj" in sentence[i]:
                if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
                    print(sentence, "nsubj")

但是,由于我得到了类似[['mary','nsubj'],['loves','ROOT'],['在两个打印输出中的“ every”,“ det”],['man','dobj']]:

[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] nsubj
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] dobj

您知道我在做错什么以及如何解决吗?

非常感谢!!

2 个答案:

答案 0 :(得分:0)

您可以在列表中使用负索引。以下示例将显示“ c”。

mylist = ['a', 'b', 'c']
print(mylist[-1])

因此,如果我们采用您的第一个论点:

[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']]

由于以下原因,它将首先在elif语句上的句子的第一个单词上打印:

  • mary是一个nsubj
  • 和句子[i-2]产生句子[-2],等于“每个”

现在,它还将打印在if语句的句子的最后一个单词上,因为:

  • man是dobj
  • 和句子[i-1]一起产生句子[2],等于“每个”

我建议您向前看而不是向后看,例如,使用以下代码:

quantifiers = ['every', 'all']
for sentence in a_list_of_sentences:
    max_index = len(sentence) - 1
    for word_index, word in enumerate(sentence):
        if word[0] in quantifiers:
            if max_index > word_index:
                if sentence[word_index+1][1] in 'nsubj':
                    print(sentence, "nsubj")
                elif sentence[word_index+1][1] in 'dobj':
                    print(sentence, "dobj")
            if max_index > word_index + 1:
                if sentence[word_index+2][1] in 'nsubj':
                    print(sentence, "nsubj")
                elif sentence[word_index+2][1] in 'dobj':
                    print(sentence, "dobj")

最后,我要谈一谈您如何使用索引。

在您的代码中,而不是:

for i, j in enumerate(sentence):
        if "dobj" in sentence[i]:

您可以这样做:

for i, j in enumerate(sentence):
        if "dobj" in j:

答案 1 :(得分:0)

问题在于列表切片索引可能为负(如果不是,则将出现 IndexError )。列表末尾的环绕有点。
检查[SO]: Understanding slice notation了解更多详细信息。
下面是一个更干净的变体。

code00.py

#!/usr/bin/env python3

import sys


def main(*argv):
    sentences = [
        [["mary", "nsubj"], ["loves", "ROOT"], ["every", "det"], ["man", "dobj"]],
        [["mary", "nsubj"], ["loves", "ROOT"], ["all", "det"], ["men", "dobj"]],
        [["all", "det"], ["students", "nsubj"], ["love", "ROOT"], ["mary", "dobj"]],
    ]
    quantifiers = ["all", "every"]
    syntactic_roles = ["nsubj", "dobj"]

    for sentence in sentences:
        #print(sentence)
        quantifier_idx = -1
        for idx, (word, syntactic_role) in enumerate(sentence):
            if quantifier_idx > -1 and idx - quantifier_idx in [1, 2] and syntactic_role in syntactic_roles:
                print(" ".join(item[0] for item in sentence) + " - " + syntactic_role)
                break
            if word in quantifiers:
                quantifier_idx = idx


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main(*sys.argv[1:])
    print("\nDone.")

输出

e:\Work\Dev\StackOverflow\q059500488>"c:\Install\pc064\Python\Python\03.08.01\python.exe" code00.py
Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] 64bit on win32

mary loves every man - dobj
mary loves all men - dobj
all students love mary - nsubj

Done.