我有一个列表列表形式的数据集(a_list_of_sentences),其中较小的列表包含一个单词及其句法依存关系,并且这些列表连接成句子,如下所示:
[[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']],
[['mary', 'nsubj'], ['loves', 'ROOT'], ['all', 'det'], ['men', 'dobj']],
[['all', 'det'], ['students', 'nsubj'], ['love', 'ROOT'], ['mary', 'dobj']]]
我想找到句子中有一个量词(例如“ every”,“ all”),后跟一个其句法依赖性是主语(nsubj)或宾语(dobj)的单词,并在它们之间进行区分两种情况。出于我的目的,主题或宾语可以是量词后面的第一个单词,也可以是量词后面的第二个单词。我尝试通过这种方式使用enumerate()来做到这一点:
for sentence in a_list_of_sentences:
for i, j in enumerate(sentence):
if "dobj" in sentence[i]:
if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
print(sentence, "dobj")
elif "nsubj" in sentence[i]:
if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
print(sentence, "nsubj")
但是,由于我得到了类似[['mary','nsubj'],['loves','ROOT'],['在两个打印输出中的“ every”,“ det”],['man','dobj']]:
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] nsubj
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] dobj
您知道我在做错什么以及如何解决吗?
非常感谢!!
答案 0 :(得分:0)
您可以在列表中使用负索引。以下示例将显示“ c”。
mylist = ['a', 'b', 'c']
print(mylist[-1])
因此,如果我们采用您的第一个论点:
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']]
由于以下原因,它将首先在elif语句上的句子的第一个单词上打印:
现在,它还将打印在if语句的句子的最后一个单词上,因为:
我建议您向前看而不是向后看,例如,使用以下代码:
quantifiers = ['every', 'all']
for sentence in a_list_of_sentences:
max_index = len(sentence) - 1
for word_index, word in enumerate(sentence):
if word[0] in quantifiers:
if max_index > word_index:
if sentence[word_index+1][1] in 'nsubj':
print(sentence, "nsubj")
elif sentence[word_index+1][1] in 'dobj':
print(sentence, "dobj")
if max_index > word_index + 1:
if sentence[word_index+2][1] in 'nsubj':
print(sentence, "nsubj")
elif sentence[word_index+2][1] in 'dobj':
print(sentence, "dobj")
最后,我要谈一谈您如何使用索引。
在您的代码中,而不是:
for i, j in enumerate(sentence):
if "dobj" in sentence[i]:
您可以这样做:
for i, j in enumerate(sentence):
if "dobj" in j:
答案 1 :(得分:0)
问题在于列表切片索引可能为负(如果不是,则将出现 IndexError )。列表末尾的环绕有点。
检查[SO]: Understanding slice notation了解更多详细信息。
下面是一个更干净的变体。
code00.py :
#!/usr/bin/env python3
import sys
def main(*argv):
sentences = [
[["mary", "nsubj"], ["loves", "ROOT"], ["every", "det"], ["man", "dobj"]],
[["mary", "nsubj"], ["loves", "ROOT"], ["all", "det"], ["men", "dobj"]],
[["all", "det"], ["students", "nsubj"], ["love", "ROOT"], ["mary", "dobj"]],
]
quantifiers = ["all", "every"]
syntactic_roles = ["nsubj", "dobj"]
for sentence in sentences:
#print(sentence)
quantifier_idx = -1
for idx, (word, syntactic_role) in enumerate(sentence):
if quantifier_idx > -1 and idx - quantifier_idx in [1, 2] and syntactic_role in syntactic_roles:
print(" ".join(item[0] for item in sentence) + " - " + syntactic_role)
break
if word in quantifiers:
quantifier_idx = idx
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main(*sys.argv[1:])
print("\nDone.")
输出:
e:\Work\Dev\StackOverflow\q059500488>"c:\Install\pc064\Python\Python\03.08.01\python.exe" code00.py Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] 64bit on win32 mary loves every man - dobj mary loves all men - dobj all students love mary - nsubj Done.