如何将nltk.tree制成的数组变成另一棵树?

时间:2018-08-28 04:00:04

标签: arrays python-3.x tree nltk spacy

我有一个由nltk.tree.Tree制作的列表

>>>question = 'When did Beyonce start becoming popular?'
>>>questionSpacy = spacy_nlp(question)
>>>print(questionSpacy)
[Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])]

目标是再造一棵树。我知道这很愚蠢,但否则我不知道如何知道代表一个句子的树是否包含在另一个代表另一个句子的树中。

我尝试了一次,但是没有成功。我想我没有考虑所有情况。有时父节点必须是array[0].label(),有时必须是array[0]

from nltk import Tree

class WordTree:
    def __init__(self, array, parent = None):
        #print("son :",array[0][i])
        self.parent = []
        self.children = [] # if parenthesis then it has son after "," analyse : include all elements until the next parenthesi
        self.data = array
        #print(array[0])
        for son in array[0]:
            print(type(son),son)
            if type(son) is Tree:
                print("sub tree creation")
                self.children.append(son.label())
                print("son:",son)
                t = WordTree(son,son.label()) # should I verify if parent is empty ?
                print("end of sub tree creation")
            elif type(son) is str:
                print("son creation")
                self.children.append(son)
            else:
                print("issue?")
                break # prolbem ?

当我运行t = WordTree(treeQuestion, treeQuestion[0].label())时,得到以下输出:

<class 'str'> When_WRB_advmod
son creation
<class 'str'> did_VBD_aux
son creation
<class 'str'> Beyonce_NNP_nsubj
son creation
<class 'nltk.tree.Tree'> (becoming_VBG_xcomp popular_JJ_acomp)
sub tree creation
son: (becoming_VBG_xcomp popular_JJ_acomp)
<class 'str'> p
son creation
<class 'str'> o
son creation
<class 'str'> p
son creation
<class 'str'> u
son creation
<class 'str'> l
son creation
<class 'str'> a
son creation
<class 'str'> r
son creation
<class 'str'> _
son creation
<class 'str'> J
son creation
<class 'str'> J
son creation
<class 'str'> _
son creation
<class 'str'> a
son creation
<class 'str'> c
son creation
<class 'str'> o
son creation
<class 'str'> m
son creation
<class 'str'> p
son creation
end of sub tree creation
<class 'str'> ?_._punct
son creation

如您所见,在('becoming_VBG_xcomp', ['popular_JJ_acomp'])中,它使用儿子的字母popular_JJ_acomp来生几个儿子,而不是用名字来生一个儿子。这当然是错误的。因此如何将nltk.tree制成的数组转换为另一棵树?

1 个答案:

答案 0 :(得分:0)

我想我已经找到了一些将nltk.tree生成的数组转换为使用Python生成的树的方法,但是我还不能将其概括化。

from anytree import Node, RenderTree

class WordTree:
    '''Tree for spaCy dependency parsing array'''
    def __init__(self, array, parent = None):
        """
        Construct a new 'WordTree' object.

        :param array: The array contening the dependency
        :param parent: The parent of the array if exists
        :return: returns nothing
        """
        self.parent = []
        self.children = []
        self.data = array

        for element in array[0]:
            print(type(element),element)
            # we check if we got a subtree
            if type(element) is Tree:
                print("sub tree creation")
                self.children.append(element.label())
                print("son:",element)
                t = WordTree([element],element.label())
                print("end of sub tree creation")
            # else if we have a string we create a son
            elif type(element) is str:
                print("son creation")
                self.children.append(element)
            # in other case we have a problem
            else:
                print("issue?")
                break 

实际上,它可以与以下示例配合使用:

[Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])]

给予:

<class 'str'> When_WRB_advmod
son creation
<class 'str'> did_VBD_aux
son creation
<class 'str'> Beyonce_NNP_nsubj
son creation
<class 'nltk.tree.Tree'> (becoming_VBG_xcomp popular_JJ_acomp)
sub tree creation
son: (becoming_VBG_xcomp popular_JJ_acomp)
<class 'str'> popular_JJ_acomp
son creation
end of sub tree creation
<class 'str'> ?_._punct
son creation

但是尝试时我没有输出:

for i,sent in enumerate(sentences):
    i = WordTree(sentences, sentences[0].label())