Google Dataflow:java.lang.IllegalArgumentException:无法setCoder(null)

时间:2016-11-02 15:57:07

标签: google-cloud-dataflow

我正在尝试构建一个用于解压缩文件的自定义接收器。

拥有这个简单的代码:

public static class ZipIO{    
  public static class Sink extends com.google.cloud.dataflow.sdk.io.Sink<String> {

    private static final long serialVersionUID = -7414200726778377175L;
    private final String unzipTarget;

      public Sink withDestinationPath(String s){
         if(s!=""){
             return new Sink(s);
         }
         else {
             throw new IllegalArgumentException("must assign destination path");
         }

      }

      protected Sink(String path){
          this.unzipTarget = path;
      }

      @Override
      public void validate(PipelineOptions po){
          if(unzipTarget==null){
              throw new RuntimeException();
          }
      } 

      @Override
      public ZipFileWriteOperation createWriteOperation(PipelineOptions po){
          return new ZipFileWriteOperation(this);
      }

  }

  private static class ZipFileWriteOperation extends WriteOperation<String, UnzipResult>{

    private static final long serialVersionUID = 7976541367499831605L;
    private final ZipIO.Sink sink;

      public ZipFileWriteOperation(ZipIO.Sink sink){
          this.sink = sink;
      }



      @Override
      public void initialize(PipelineOptions po) throws Exception{

      }

      @Override
      public void finalize(Iterable<UnzipResult> writerResults, PipelineOptions po) throws Exception {
         long totalFiles = 0;
         for(UnzipResult r:writerResults){
             totalFiles +=r.filesUnziped;
         }
         LOG.info("Unzipped {} Files",totalFiles);
      }  

      @Override
      public ZipIO.Sink getSink(){
          return sink;
      }

      @Override
      public ZipWriter createWriter(PipelineOptions po) throws Exception{
          return new ZipWriter(this);
      }

  }

  private static class ZipWriter extends Writer<String, UnzipResult>{
      private final ZipFileWriteOperation writeOp;
      public long totalUnzipped = 0;

      ZipWriter(ZipFileWriteOperation writeOp){
          this.writeOp = writeOp;
      }

      @Override
      public void open(String uID) throws Exception{
      }

      @Override
      public void write(String p){
            System.out.println(p);
      }

      @Override
      public UnzipResult close() throws Exception{
          return new UnzipResult(this.totalUnzipped);
      }

      @Override
      public ZipFileWriteOperation getWriteOperation(){
          return writeOp;
      }


  }

  private static class UnzipResult implements Serializable{  
    private static final long serialVersionUID = -8504626439217544799L;
    public long filesUnziped=0;      
      public UnzipResult(long filesUnziped){
          this.filesUnziped=filesUnziped;
      }
  }
}

}

处理失败,错误:

  

线程中的异常&#34; main&#34; java.lang.IllegalArgumentException:无法setCoder(null)       在com.google.cloud.dataflow.sdk.values.TypedPValue.setCoder(TypedPValue.java:67)       在com.google.cloud.dataflow.sdk.values.PCollection.setCoder(PCollection.java:150)       在com.google.cloud.dataflow.sdk.io.Write $ Bound.createWrite(Write.java:380)       在com.google.cloud.dataflow.sdk.io.Write $ Bound.apply(Write.java:112)       在com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner $ BatchWrite.apply(DataflowPipelineRunner.java:2118)       在com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner $ BatchWrite.apply(DataflowPipelineRunner.java:2099)       在com.google.cloud.dataflow.sdk.runners.PipelineRunner.apply(PipelineRunner.java:75)       在com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.apply(DataflowPipelineRunner.java:465)       在com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.apply(BlockingDataflowPipelineRunner.java:169)       在com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:368)       在com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:275)       在com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.apply(DataflowPipelineRunner.java:463)       在com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.apply(BlockingDataflowPipelineRunner.java:169)       在com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:368)       在com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:291)       在com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:174)       在com.mcd.de.tlogdataflow.StarterPipeline.main(StarterPipeline.java:93)

感谢任何帮助。

谢谢&amp; BR 菲利普

1 个答案:

答案 0 :(得分:0)

此崩溃是由Dataflow Java SDK(specifically, this line)中的一个错误引起的,该错误也存在于Apache Beam(孵化)Java SDK中。

必须始终覆盖方法Sink.WriterOperation#getWriterResultCoder(),但我们无法将其标记为abstract。它在Beam中固定,但在Dataflow SDK中没有变化。您应该覆盖此方法并返回适当的编码器。

你有一些选择来提出编码器:

  1. 编写自己的小编码器类,包装VarLongCoderBigEndianLongCoder
  2. 之一
  3. 只需使用long代替UnzipResult结构,即可按现状使用。
  4. 由于尺寸过大而不太可取,您可以使用SerializableCoder.of(UnzipResult.class)