Question

我尝试使用Stanford CoreNLP库从自然语言内容中提取信息。

我的目标是提取＆＃34;主题 - 行动 - 对象＆＃34;从句子中对（简化）。

作为一个例子，考虑以下句子：

约翰史密斯只吃午饭吃苹果和香蕉。他正在节食，他的母亲告诉他午餐少吃会很健康。约翰根本不喜欢它，但由于他的饮食非常严肃，他不想停下来。

从这句话我想得到如下结果：

约翰史密斯 - 吃 - 午餐只吃苹果和香蕉
他 - 正在节食
他的母亲 - 告诉他 - 吃午饭吃得更健康
约翰 - 不喜欢 - 它（根本）
他 - 他的饮食非常严重

如何做到这一点？

或者更具体：如何解析依赖树（或更适合的树？）以获得上面指定的结果？

给予此任务的任何提示，资源或代码段都将受到高度赞赏。

旁注：我设法用他们的代表性提及替换了coreferences，然后将he和his更改为相应的实体（在这种情况下为John Smith）。

Answer 1

Stanford CoreNLP工具包附带一个依赖解析器。

首先，这里是一个描述树中边缘类型的链接：

http://universaldependencies.github.io/docs/

有许多方法可以使用工具包生成依赖关系树。

以下是一些示例代码，可帮助您入门：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;

public class DependencyTreeExample {

    public static void main (String[] args) throws IOException {

        // set up properties
        Properties props = new Properties();
        props.setProperty("ssplit.eolonly","true");
        props.setProperty("annotators",
                "tokenize, ssplit, pos, depparse");
        // set up pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // get contents from file
        String content = new Scanner(new File(args[0])).useDelimiter("\\Z").next();
        System.out.println(content);
        // read in a product review per line
        Annotation annotation = new Annotation(content);
        pipeline.annotate(annotation);

        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            System.out.println("---");
            System.out.println("sentence: "+sentence);
            SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
        }


    }

}

说明：

将其剪切并粘贴到DependencyTreeExample.java
将该文件放在目录stanford-corenlp-full-2015-04-20
javac -cp＆＃34; *：。＆＃34; DependencyTreeExample.java
将每行一句话添加到名为dependency_sentences.txt
java -cp＆＃34; *：。＆＃34; DependencyTreeExample dependency_sentences.txt

输出示例：

sentence: John doesn't like it at all.
dep                 reln                gov                 
---                 ----                ---                 
like-4              root                root                
John-1              nsubj               like-4              
does-2              aux                 like-4              
n't-3               neg                 like-4              
it-5                dobj                like-4              
at-6                case                all-7               
all-7               nmod:at             like-4              
.-8                 punct               like-4

这将打印出依赖关系解析。通过使用SemanticGraph对象，您可以编写代码来查找所需的模式类型。

您将在此示例中注明＆＃34;喜欢＆＃34;指向约翰＆＃34;与＆＃34; nsubj＆＃34;和＆＃34;喜欢＆＃34;指向＆＃34;它＆＃34;与＆＃34; dobj＆＃34;

供参考，你应该看一下edu.stanford.nlp.semgraph.SemanticGraph

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/SemanticGraph.html

Answer 2

您还可以试用新的Stanford OpenIE系统：http://nlp.stanford.edu/software/openie.shtml。除了独立下载之外，它现在捆绑在CoreNLP 3.6.0 +中。

使用Stanford CoreNLP进行关系提取

2 个答案: