我试图计算一个没有下位词的名词的所有下位词(在该名词下面的名词层次结构中是终点)。例如,对于“实体”(层次结构中最高的名词),结果应该是没有下义词的所有名词的计数(所有名词都是层次结构中的终端)。对于终端本身的名词,数字必须为1.我有一个名词列表。输出必须为列表中的每个名词提供这样的计数。
经过大量搜索,试验和错误,这是我提出的代码(只有相关部分):
import nltk
from nltk.corpus import wordnet as wn
def get_hyponyms(synset): #function source:https://stackoverflow.com/questions/15330725/how-to-get-all-the-hyponyms-of-a-word-synset-in-python-nltk-and-wordnet?rq=1
hyponyms = set()
for hyponym in synset.hyponyms():
hyponyms |= set(get_hyponyms(hyponym))
return hyponyms | set(synset.hyponyms())
with open("list-nouns.txt", "rU") as wordList1:
myList1 = [line.rstrip('\n') for line in wordList1]
for word1 in myList1:
list1 = wn.synsets(word1, pos='n')
countTerminalWord1 = 0 #counter for synsets without hyponyms
countHyponymsWord1 = 0 #counter for synsets with hyponyms
for syn_set1 in list1:
syn_set11a = get_hyponyms(syn_set1)
n = len(get_hyponyms(syn_set1)) #number of hyponyms
if n > 0:
countHyponymsWord1 += n
else:
countTerminalWord1 += 1
for syn_set11 in syn_set11a:
syn_set111a = get_hyponyms(syn_set11)
n = len(get_hyponyms(syn_set11))
if n > 0:
countHyponymsWord1 += n
else:
countTerminalWord1 += 1
#...further iterates in the same way for the following levels
print (countHyponymsWord1)
print (countTerminalWord1)
(该代码还尝试计算所有具有下位词的名词,但这不是必需的。)
主要问题是我不能为19个步骤的名词层次结构的整个深度重复此代码。它很快就会出现'SystemError:太多静态嵌套块'。
如何解决这个问题的帮助或建议将不胜感激。