限制斯坦福大学CoreNLP的短语级标签集

时间:2019-01-15 21:19:03

标签: java nlp stanford-nlp pos-tagger

Pi带我贴上here的问题,我想问一下在解析时是否可以排除某些短语级别的标签。具体来说,我使用的是Stanford CorenNLP版本3.9.2 Shift-Reduce解析器(针对其支持者风格的输出),现在我有向ParserConstraint添加ParserQuery约束的经验。但是,是否可以立即使用ParserConstraint(有效地)执行我想做的事情尚不明显。

我很高兴知道我的输入文本在具有有限矩阵子句的完整句子中是同构语法。因此,任何时候解析器输出包含一个FRAGUCP标签,解析肯定是不准确的。我希望能够事先告诉解析器“不要使用FRAGUCP,以试图通过限制解决方案集来提高输出质量

有可能吗?怎么办?

摘要:

import java.io.*;
import java.util.*;
import java.text.SimpleDateFormat;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
import edu.stanford.nlp.parser.shiftreduce.ShiftReduceParser;
import edu.stanford.nlp.parser.common.ParserQuery;
import edu.stanford.nlp.parser.common.ParserConstraint;

public class constraintTest {

    // Initialize POS tagger and parser.
    private static MaxentTagger meTagger = new MaxentTagger("edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger");
    private static ShiftReduceParser srParser = ShiftReduceParser.loadModel("edu/stanford/nlp/models/srparser/englishSR.ser.gz");

    public static void main(String[] args) throws IOException {

        String text = "";
        // If user passes in the name of an input file, use that.
        if (args.length > 0) {
            text = IOUtils.slurpFileNoExceptions(args[0]);
            System.out.println(text);

        // If user does not pass in a file, ask for some sentences.
        } else {
            System.out.println("Please enter a sentence for parsing:");
            Scanner input = new Scanner(System.in);
            text = input.nextLine();  
        }

        // Create output filename and file.
        String fileName = new SimpleDateFormat("'output/trees'yyyyMMdd'_'HHmmss'.txt'").format(new Date());
        PrintWriter writer = new PrintWriter(fileName, "UTF-8");

        // Prepare document for reading.
        DocumentPreprocessor tokenizedText = new DocumentPreprocessor(new StringReader(text));
        int i = 1;
        for (List<HasWord> sentence : tokenizedText) {
            List<TaggedWord> taggedSentence = meTagger.tagSentence(sentence);

            // To parse sentences WITHOUT parser constraints:
            // Tree tree = srParser.apply(taggedSentence);

            // To parse sentences WITH parser constraints:
            Tree tree = constrainedTree(taggedSentence);

            // Print to file.
            writer.println(tree);

            // Print to standard out while you're at it.
            System.out.println("/-/-/-/ Sentence #" + i + " /-/-/-/");
            tree.pennPrint();
            System.out.println();
            i += 1;

        }
        writer.close();
    }

    // Takes a list of TaggedWord objects and outputs a parse tree with
    // a constraint that the topmost label (below ROOT) be S.
    public static Tree constrainedTree(List<TaggedWord> taggedSentence) {
        int sentenceLength = taggedSentence.size();

        ParserConstraint constraint = new ParserConstraint(0, sentenceLength, "S");
        List<ParserConstraint> constraints = Collections.singletonList(constraint);
        ParserQuery pq = srParser.parserQuery();
        pq.setConstraints(constraints);
        pq.parse(taggedSentence);
        Tree tree = pq.getBestParse();

        return tree;

    }

}

一如既往,谢谢!

0 个答案:

没有答案