如何从依赖解析器的输出中制作一棵树?

时间:2018-09-03 11:17:07

标签: python dictionary nlp nltk stanford-nlp

我正在尝试从依赖解析器的输出中创建一棵树(嵌套字典)。这句话是“我在睡眠中射杀了大象”。我能够获得链接上描述的输出: How do I do dependency parsing in NLTK?

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

要将此元组列表转换为嵌套字典,我使用以下链接: How to convert python list of tuples into tree?

def build_tree(list_of_tuples):
    all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
    root = {}    
    print all_nodes
    for item in list_of_tuples:
        rel, gov,dep = item
        if gov is not 'ROOT':
            all_nodes[gov][1][dep] = all_nodes[dep]
        else:
            root[dep] = all_nodes[dep]
    return root

输出如下:

{'shot': (('ROOT', 'ROOT'),
  {'I': (('nsubj', 'shot'), {}),
   'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
   'sleep': (('nmod', 'shot'),
    {'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}

要找到从根到叶的路径,我使用了以下链接:Return root to specific leaf from a nested dictionary tree

[制作树和查找路径是两件分开的事情]第二个目标是找到从根到叶节点的路径,就像完成Return root to specific leaf from a nested dictionary tree一样。 但是我想从根到叶(依赖关系路径) 因此,例如,当我调用recurse_category(categories,'an')时,category是嵌套的树结构,而'an'是树中的单词,我应该得到ROOT-nsubj-dobj(直到根的依赖关系)为输出。

2 个答案:

答案 0 :(得分:1)

首先,如果仅对Stanford CoreNLP依赖关系解析器使用预训练的模型,则应使用CoreNLPDependencyParser中的nltk.parse.corenlp,并避免使用旧的nltk.parse.stanford接口。

请参见Stanford Parser and NLTK

在终端中使用Python下载并运行Java服务器后,

>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>

现在我们看到解析是DependencyGraph https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36

类型的nltk.parse.dependencygraph

只需执行DependencyGraph,即可将nltk.tree.Tree转换为DependencyGraph.tree()对象:

>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])

>>> parses[0].tree().pretty_print()
          shot                  
  _________|____________         
 |   |  elephant      banana    
 |   |     |       _____|_____   
 I   .     an    with         a 

要将其转换为带括号的解析格式:

>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)

如果要查找依赖项三元组:

>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]

>>> for governor, dep, dependent in parses[0].triples():
...     print(governor, dep, dependent)
... 
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')

CONLL格式:

>>> print(parses[0].to_conll(style=10))
1   I   I   PRP PRP _   2   nsubj   _   _
2   shot    shoot   VBD VBD _   0   ROOT    _   _
3   an  a   DT  DT  _   4   det _   _
4   elephant    elephant    NN  NN  _   2   dobj    _   _
5   with    with    IN  IN  _   7   case    _   _
6   a   a   DT  DT  _   7   det _   _
7   banana  banana  NN  NN  _   2   nmod    _   _
8   .   .   .   .   _   2   punct   _   _

答案 1 :(得分:0)

这会将输出转换为嵌套字典形式。如果我也可以找到该路径,我会及时通知您。也许这很有帮助。

list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]

nodes={}

for i in list_of_tuples:
    rel,parent,child=i
    nodes[child]={'Name':child,'Relationship':rel}

forest=[]

for i in list_of_tuples:
    rel,parent,child=i
    node=nodes[child]

    if parent=='ROOT':# this should be the Root Node
            forest.append(node)
    else:
        parent=nodes[parent]
        if not 'children' in parent:
            parent['children']=[]
        children=parent['children']
        children.append(node)

print forest

输出为嵌套字典,

[{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]

以下功能可以帮助您找到从根到叶的路径:

def recurse_category(categories,to_find):
    for category in categories: 
        if category['Name'] == to_find:
            return True, [category['Relationship']]
        if 'children' in category:
            found, path = recurse_category(category['children'], to_find)
            if found:
                return True, [category['Relationship']] + path
    return False, []