如何计算NLTK和WordNet没有下位词的名词的下位词?

时间:2017-01-09 07:22:27

标签: python iteration nltk wordnet

我试图计算一个没有下位词的名词的所有下位词(在该名词下面的名词层次结构中是终点)。例如,对于“实体”(层次结构中最高的名词),结果应该是没有下义词的所有名词的计数(所有名词都是层次结构中的终端)。对于终端本身的名词,数字必须为1.我有一个名词列表。输出必须为列表中的每个名词提供这样的计数。

经过大量搜索,试验和错误,这是我提出的代码(只有相关部分):

import nltk
from nltk.corpus import wordnet as wn

def get_hyponyms(synset): #function source:https://stackoverflow.com/questions/15330725/how-to-get-all-the-hyponyms-of-a-word-synset-in-python-nltk-and-wordnet?rq=1
    hyponyms = set()
    for hyponym in synset.hyponyms():
        hyponyms |= set(get_hyponyms(hyponym))
    return hyponyms | set(synset.hyponyms())

with open("list-nouns.txt", "rU") as wordList1:
    myList1 = [line.rstrip('\n') for line in wordList1]
    for word1 in myList1:
        list1 = wn.synsets(word1, pos='n')
        countTerminalWord1 = 0  #counter for synsets without hyponyms
        countHyponymsWord1 = 0  #counter for synsets with hyponyms
        for syn_set1 in list1:
            syn_set11a = get_hyponyms(syn_set1)
            n = len(get_hyponyms(syn_set1))  #number of hyponyms
            if n > 0:
                countHyponymsWord1 += n
            else:
                countTerminalWord1 += 1
            for syn_set11 in syn_set11a:
                syn_set111a = get_hyponyms(syn_set11)
                n = len(get_hyponyms(syn_set11))
                if n > 0:
                    countHyponymsWord1 += n
                else: 
                    countTerminalWord1 += 1
                #...further iterates in the same way for the following levels
        print (countHyponymsWord1)
        print (countTerminalWord1)

(该代码还尝试计算所有具有下位词的名词,但这不是必需的。)

主要问题是我不能为19个步骤的名词层次结构的整个深度重复此代码。它很快就会出现'SystemError:太多静态嵌套块'。

如何解决这个问题的帮助或建议将不胜感激。

0 个答案:

没有答案