nltk.Tree类的漂亮树打印以以下格式打印:
print spacy2tree(nlp(u'Williams is a defensive coach') )
(S
(SUBJ Williams/NNP)
(PRED is/VBZ test/VBN)
a/DT
defensive/JJ
coach/NN)
作为树:
spacy2tree(nlp(u'Williams is a defensive coach') )
Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]),
Tree('PRED', [(u'is', u'VBZ'), ('test', 'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])
但是没有正确摄取它:
tfs = spacy2tree(nlp(u'Williams is a defensive coach') ).pformat()
Tree.fromstring(tfs)
Tree('S', [Tree('SUBJ', ['Williams/NNP']),
Tree('PRED', ['is/VBZ', 'test/VBN']), 'a/DT', 'defensive/JJ', 'coach/NN'])
示例:
correct incorrect
('SUBJ', [(u'Williams', u'NNP')]) =vs=> ('SUBJ', ['Williams/NNP'])
('PRED', [(u'is', u'VBZ'), ('test', 'VBN')]) =vs=> ('PRED', ['is/VBZ', 'test/VBN'])
是否有实用程序可以从字符串中正确提取树?
答案 0 :(得分:0)
似乎我知道了:
: Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/')))
: Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]),
Tree('PRED', [(u'is', u'VBZ'), (u'test', u'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])
所以现在这也可以正常工作:
: tree2conlltags(Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/'))))
:
[(u'Williams', u'NNP', u'B-SUBJ'),
(u'is', u'VBZ', u'B-PRED'),
(u'test', u'VBN', u'I-PRED'),
(u'a', u'DT', u'O'),
(u'defensive', u'JJ', u'O'),
(u'coach', u'NN', u'O')]