如何通过python中的stanford解析器到达树生成器的叶子?

时间:2014-04-05 06:44:38

标签: python nltk stanford-nlp

我通过执行以下操作在python中使用stanford解析器:

import os
sentence = "Did Matt win the men slalom?"
os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh  
  ~/stanfordtemp.txt").readlines()

for tree in parser_out:
    print tree

但是,我不知道如何访问解析器返回的树的叶子。你可以帮助我吗?我还必须编写一个能够从英语句子生成sql查询的代码。关于这个的任何提示?任何帮助都感激不尽。我也在使用nltk进行所有操作。

2 个答案:

答案 0 :(得分:1)

这是构建树然后递归构建叶子列表的示例。示例文本来自the online standford parser

# class for tree nodes
class Node:
    def __init__(self,start):
        self.start = start
        self.children = []
        self.text = ''

# make a tree        
def make_tree(s):
    stack = []
    nodes = []
    cur = None
    root = None    

    for i, c in enumerate(s):
        if c == '(':
            cur = Node(i)
            if stack:
                stack[-1].children.append(cur)
            stack.append(cur)

            if root is None:
                root = cur

        elif c == ')' and stack:
            topnode = stack.pop()

            text = s[topnode.start + 1: i]
            topnode.text = text

    return root

# list of leaves
def list_of_leaves(node):
    result = []
    for child in node.children:
        result.extend(list_of_leaves(child))
    if not result:
        return [node]

    return result

s = """(ROOT
  (SQ (VBD Did)
    (NP (NNP Matt))
    (VP (VB win)
      (NP (DT the) (NNS men) (NN slalom)))
    (. ?)))"""

root = make_tree(s)    

for node in list_of_leaves(root):
    print node.text

答案 1 :(得分:0)

如何用句子提取单个子句作为子树?因此,每当子句开始(S,SBAR,SBARQ等)时,提取为子树直到遇到另一个子句。对于最后一句,它直到句末。

以下是一个例子:

(ROOT   (S     (S       (NP(NNP John))       (副总裁(VBZ生活)         (PP(IN in)           (NP(新NNP新)(NNP约克)(NN市)))))     (,)     (CC但是)     (S       (SBAR         (WHADVP(每当WRB))         (S           (NP(PRP他))           (VP(VBZ旅行)             (S               (VP(TO to)                 (副总裁(VB工作)))))))       (,)       (NP(PRP他))       (VP(VBZ旅行)         (ADVP(RB非常)(RB远))         (PP(TO to)           (NP(PRP $ his)(NN工作)(NN地点)))))     (。))))