如何将Stanford nlp中的CoreDocument保存到磁盘2

时间:2018-12-28 21:46:51

标签: stanford-nlp

按照曼宁教授的建议使用ProtobufAnnotationSerializer并做错了事。

在正确的工作文档上使用了serializer.writeCoreDocument;稍后使用pair = serializer.read;读取书面文件;然后使用pair.second InputStream p2 = pair.second;当运行Pair pair3 = serializer.read(p2);时,p2为空,导致空指针。

public void writeDoc(CoreDocument document, String filename ) {
    AnnotationSerializer serializer = new 
ProtobufAnnotationSerializer();
    FileOutputStream fos = null;
    try { 
        OutputStream ks = new FileOutputStream(filename);
        ks = serializer.writeCoreDocument(document, ks);
        ks.flush();
        ks.close();
    }catch(IOException ioex) {
        logger.error("IOException "+ioex);
    }
  }

public void ReadSavedDoc(String filename) {
    // Read
    byte[]kb = null;
    try {
        File initialFile = new File(filename);
        InputStream ks = new FileInputStream(initialFile);
        ProtobufAnnotationSerializer serializer = new 
ProtobufAnnotationSerializer();
        InputStream kis = new 
ByteArrayInputStream(ks.readAllBytes());
        ks.close();
        Pair<Annotation, InputStream> pair = serializer.read(kis);
        InputStream p2 = pair.second;
        int nump2 = p2.available();
        logger.info(nump2);
        byte[] ba = p2.readAllBytes();
        Annotation readAnnotation = pair.first;
        Pair<Annotation, InputStream> pair3 = serializer.read(p2);
        kis.close();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    }  catch (ClassCastException e) {
        e.printStackTrace();
    } catch(Exception ex) {
        logger.error("Exception: "+ex);
        ex.printStackTrace();
    }

}   

1 个答案:

答案 0 :(得分:0)

此行是不必要的,应将其删除:

Pair<Annotation, InputStream> pair3 = serializer.read(p2);

如果您正确设置了readAnnotation,那么读写过程就结束了。 p2为空,因为您已经阅读了所有内容。

这里有一个清晰的示例说明如何使用序列化:

https://github.com/stanfordnlp/CoreNLP/blob/master/itest/src/edu/stanford/nlp/pipeline/ProtobufSerializationSanityITest.java

您还必须从CoreDocument建立一个Annotation

CoreDocument readDocument = new CoreDocument(readAnnotation);