Spacy中的依赖解析树

时间:2017-03-16 02:13:07

标签: nlp spacy dependency-parsing

我有一句话约翰在商店看到一顶华丽的帽子
如何将其表示为依赖树,如下所示?

(S
      (NP (NNP John))
      (VP
        (VBD saw)
        (NP (DT a) (JJ flashy) (NN hat))
        (PP (IN at) (NP (DT the) (NN store)))))

我从here

获得了此脚本
import spacy
from nltk import Tree
en_nlp = spacy.load('en')

doc = en_nlp("John saw a flashy hat at the store")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]

我得到以下内容,但我正在寻找一种树(NLTK)格式。

     saw                 
  ____|_______________    
 |        |           at 
 |        |           |   
 |       hat        store
 |     ___|____       |   
John  a      flashy  the

2 个答案:

答案 0 :(得分:5)

要为SpaCy依赖项分析重新创建NLTK样式树,请尝试使用draw中的nltk.tree方法而不是pretty_print

import spacy
from nltk.tree import Tree

spacy_nlp = spacy.load("en")

def nltk_spacy_tree(sent):
    """
    Visualize the SpaCy dependency tree with nltk.tree
    """
    doc = spacy_nlp(sent)
    def token_format(token):
        return "_".join([token.orth_, token.tag_, token.dep_])

    def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
            return Tree(token_format(node),
                       [to_nltk_tree(child) 
                        for child in node.children]
                   )
        else:
            return token_format(node)

    tree = [to_nltk_tree(sent.root) for sent in doc.sents]
    # The first item in the list is the full tree
    tree[0].draw()

请注意,因为SpaCy目前仅支持单词和名词 - 词组级别的依赖项解析和标记,所以SpaCy树的结构不会像您从中获得的那样深入,例如,Stanford解析器,您也可以看作一棵树:

from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser

# Note: Download Stanford jar dependencies first
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)

def nltk_stanford_tree(sent):
    """
    Visualize the Stanford dependency tree with nltk.tree
    """
    parse = stanford_parser.raw_parse(sent)
    tree = list(parse)
    # The first item in the list is the full tree
    tree[0].draw()

现在,如果我们同时运行,nltk_spacy_tree("John saw a flashy hat at the store.")将生成this imagenltk_stanford_tree("John saw a flashy hat at the store.")将生成this one

答案 1 :(得分:3)

除了文本表示之外,您要实现的是从依赖图中获取选区树。你想要的输出的例子是一个经典的选区树(如短语结构语法,而不是依赖语法)。

虽然从选区树到依赖图的转换或多或少是一个自动化任务(例如,http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf),但另一个方向却不是。有关于此的工作,请查看PAD项目https://github.com/ikekonglp/PAD以及描述基础算法的文章:http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf

如果你真的需要一个选区解析,你可能还想重新考虑,这是一个很好的论据:https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e