从python 3.5中的nltk子树列表创建一个完整的nltk解析树

时间:2016-06-10 00:44:15

标签: python parsing tree nltk subtree

我有一个子树列表,我从一个解析历史派生的格式如下:

解析历史记录:

main()

列表中的每个元组都有一个包含规则列表的语法字典的键。元组中的第二项是该给定键的规则索引。

语法是:

parse = [('S', 0), ('NP', 1), ('Det', 0), ('N', 0), ('VP', 1), ('V', 4), ('NP', 2), ('NP', 0), ('PN', 1), ('NP', 1), ('Det', 0), ('N', 3)]

子树列表是:

grammar = {'S': [['NP', 'VP']],
               'NP': [['PN'], ['Det', 'N']],
               'VP': [['V'], ['V', 'NP', 'NP']],
               'PN': [['John'], ['Mary'], ['Bill']],
               'Det': [['the'], ['a']],
               'N': [['man'], ['woman'], ['drill sergeant'], ['dog']],
               'V': [['slept'], ['cried'], ['assaulted'],
                     ['devoured'], ['showed']]}

我使用以下代码创建了子树列表:

[Tree('S', ['NP', 'VP']), Tree('NP', ['Det', 'N']), Tree('Det', ['the']), Tree('N', ['man']), Tree('VP', ['V', 'NP', NP]), Tree('V', ['showed']), Tree('NP', ['PN']), Tree('PN', ['Mary']), Tree('NP', ['Det', 'N']), Tree('Det', ['the']), Tree('N', ['dog'])]

我打印树时得到的输出(我知道这不是正确的方法,但它至少显示了子树)如下:

for item in parse:
        apple = Tree(item[0], grammar[item[0]][item[1]])
        trees.append(apple)

感谢您的帮助!

:: EDIT ::

正确的输出应如下所示:

(S NP VP)
(NP Det N)
(Det the)
(N man)
(VP V NP)
(V showed)
(NP NP NP)
(NP PN)
(PN Mary)
(NP Det N)
(Det the)
(N dog)

1 个答案:

答案 0 :(得分:0)

您需要递归构建树,但您需要区分终端和非终端。顺便说一句。你的解析序列似乎错了。我把它搞砸了:

def build_tree(parse):
  assert(parse)
  rule_head = parse[0][0]
  rule_body = grammar[rule_head][parse[0][1]]
  tree_body = []
  rest = parse[1:]
  for r in rule_body:
    if non_term(r):
        (subtree,rest) = build_tree(rest)
        tree_body.append(subtree)
    else:
        tree_body.append(r)

  return (tree(rule_head,tree_body), rest)