如何将Stanford nlp中的CoreDocument保存到磁盘

时间:2018-12-21 20:34:26

标签: java save stanford-nlp

创建带注释的CoreDocument后,要将其保存到磁盘中,以后再检索。

计算带注释的CoreDocument很慢。创建之后,曾经想在以后使用它,即从磁盘检索它。

// tslint:disable-next-line:no-output-rename
@Output('bpCreated') blueprintCreated

2 个答案:

答案 0 :(得分:0)

您应该查看AnnotationSerializer类:

https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/pipeline/AnnotationSerializer.html

具体来说,尽管此类有多个实例化,但我们主要使用了ProtobufAnnotationSerializer

您可以看到一些集成测试中的使用示例。 ProtobufSerializationSanityITest是如何使用它的简单示例。 ProtobufAnnotationSerializerSlowITest是一个更为详尽但复杂的示例。您可以在Github repository中找到它们。

答案 1 :(得分:0)

Thanks for the help, as I'm new to the stanford npl. The AnnotationSerialize class 
moved me forward in saving the document to disk. I had a further misunderstanding 
about interpreting the result.  I didn't realize that the result (pair.first) 
contained the full result.  The pertinent code is:

public void writeDoc(CoreDocument document, String filename ) {
    AnnotationSerializer serializer = new ProtobufAnnotationSerializer();
    FileOutputStream fos = null;
    try {
        OutputStream ks = new FileOutputStream(filename);
        ks = serializer.writeCoreDocument(document, ks);
        ks.flush();
        ks.close();
    }catch(IOException ioex) {
        logger.error("IOException "+ioex);
    }
  }

public void ReadSavedDoc(String filename) {
    try {
        File initialFile = new File(filename);
        InputStream ks = new FileInputStream(initialFile);

     // Read
        AnnotationSerializer serializer = new ProtobufAnnotationSerializer();
        InputStream kis = new ByteArrayInputStream(ks.readAllBytes());
        Pair<Annotation, InputStream> pair = serializer.read(kis);
        pair.second.close();
        Annotation readAnnotation = pair.first;
        kis.close();
     //Output
        List<CoreLabel> newTokens = 
readAnnotation.get(CoreAnnotations.TokensAnnotation.class);
        for(CoreLabel atoken: newTokens)
            System.out.println("atoken "+atoken);
        List<CoreMap> newSentences = 
readAnnotation.get(CoreAnnotations.SentencesAnnotation.class);
        logger.info("Sentences "+newSentences);
        String newEntity = 
readAnnotation.get(CoreAnnotations.NamedEntityTagAnnotation.class);
        System.out.println("named entity "+newEntity);
        String newPOS = 
readAnnotation.get(CoreAnnotations.PartOfSpeechAnnotation.class);
        logger.info("pos "+newPOS);
        for(CoreMap sentence : newSentences){
            System.out.println(sentence);
        }
    } catch (IOException e) {
        e.printStackTrace();
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    }  catch (ClassCastException e) {
        e.printStackTrace();
    } catch(Exception ex) {
        logger.error("Exception: "+ex);
        ex.printStackTrace();
    }

}
Hope this helps someone else.  Don