我正在尝试从依赖解析器的输出中创建一棵树(嵌套字典)。这句话是“我在睡眠中射杀了大象”。我能够获得链接上描述的输出: How do I do dependency parsing in NLTK?
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
要将此元组列表转换为嵌套字典,我使用以下链接: How to convert python list of tuples into tree?
def build_tree(list_of_tuples):
all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
root = {}
print all_nodes
for item in list_of_tuples:
rel, gov,dep = item
if gov is not 'ROOT':
all_nodes[gov][1][dep] = all_nodes[dep]
else:
root[dep] = all_nodes[dep]
return root
输出如下:
{'shot': (('ROOT', 'ROOT'),
{'I': (('nsubj', 'shot'), {}),
'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
'sleep': (('nmod', 'shot'),
{'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}
要找到从根到叶的路径,我使用了以下链接:Return root to specific leaf from a nested dictionary tree
[制作树和查找路径是两件分开的事情]第二个目标是找到从根到叶节点的路径,就像完成Return root to specific leaf from a nested dictionary tree一样。
但是我想从根到叶(依赖关系路径)
因此,例如,当我调用recurse_category(categories,'an')时,category是嵌套的树结构,而'an'是树中的单词,我应该得到ROOT-nsubj-dobj
(直到根的依赖关系)为输出。
答案 0 :(得分:1)
首先,如果仅对Stanford CoreNLP依赖关系解析器使用预训练的模型,则应使用CoreNLPDependencyParser
中的nltk.parse.corenlp
,并避免使用旧的nltk.parse.stanford
接口。
在终端中使用Python下载并运行Java服务器后,
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>
现在我们看到解析是DependencyGraph
https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36
nltk.parse.dependencygraph
只需执行DependencyGraph
,即可将nltk.tree.Tree
转换为DependencyGraph.tree()
对象:
>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])
>>> parses[0].tree().pretty_print()
shot
_________|____________
| | elephant banana
| | | _____|_____
I . an with a
要将其转换为带括号的解析格式:
>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)
如果要查找依赖项三元组:
>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]
>>> for governor, dep, dependent in parses[0].triples():
... print(governor, dep, dependent)
...
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')
CONLL格式:
>>> print(parses[0].to_conll(style=10))
1 I I PRP PRP _ 2 nsubj _ _
2 shot shoot VBD VBD _ 0 ROOT _ _
3 an a DT DT _ 4 det _ _
4 elephant elephant NN NN _ 2 dobj _ _
5 with with IN IN _ 7 case _ _
6 a a DT DT _ 7 det _ _
7 banana banana NN NN _ 2 nmod _ _
8 . . . . _ 2 punct _ _
答案 1 :(得分:0)
这会将输出转换为嵌套字典形式。如果我也可以找到该路径,我会及时通知您。也许这很有帮助。
list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]
nodes={}
for i in list_of_tuples:
rel,parent,child=i
nodes[child]={'Name':child,'Relationship':rel}
forest=[]
for i in list_of_tuples:
rel,parent,child=i
node=nodes[child]
if parent=='ROOT':# this should be the Root Node
forest.append(node)
else:
parent=nodes[parent]
if not 'children' in parent:
parent['children']=[]
children=parent['children']
children.append(node)
print forest
输出为嵌套字典,
[{'Name': 'shot', 'Relationship': 'ROOT',
'children':
[{'Name': 'I', 'Relationship': 'nsubj'},
{'Name': 'elephant', 'Relationship':
'dobj',
'children':
[{'Name': 'an',
'Relationship': 'det'}]},
{'Name': 'sleep', 'Relationship':
'nmod',
'children':
[{'Name': 'in',
'Relationship': 'case'},
{'Name': 'my', 'Relationship':
'nmod:poss'}]}]}]
以下功能可以帮助您找到从根到叶的路径:
def recurse_category(categories,to_find):
for category in categories:
if category['Name'] == to_find:
return True, [category['Relationship']]
if 'children' in category:
found, path = recurse_category(category['children'], to_find)
if found:
return True, [category['Relationship']] + path
return False, []