错误:无法编码null Long

时间:2017-05-27 13:53:52

标签: google-cloud-dataflow

我正在尝试修改bigquery-dataflow示例以处理CSV文件。 https://github.com/GoogleCloudPlatform/bigquery-etl-dataflow-sample

我已将readObject更改为解析CSV并将其添加到datum对象。在运行管道时,我收到以下错误:

  

(2b01c6a9d56ae128):java.lang.RuntimeException:   com.google.cloud.dataflow.sdk.coders.CoderException:无法编码   null长期待在   com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn $ 1.输出(SimpleParDoFn.java:160)     在   com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)     在   com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnProcessContext.output(DoFnRunnerBase.java:450)     在   com.google.cloud.dataflow.sdk.transforms.MapElements $ 1.processElement(MapElements.java:109)   引起:com.google.cloud.dataflow.sdk.coders.CoderException:不能   编码null Long at   com.google.cloud.dataflow.sdk.coders.VarLongCoder.getEncodedElementByteSize(VarLongCoder.java:92)     在   com.google.cloud.dataflow.sdk.coders.VarLongCoder.getEncodedElementByteSize(VarLongCoder.java:34)     在   com.google.cloud.dataflow.sdk.coders.StandardCoder.registerByteSizeObserver(StandardCoder.java:185)     在   com.google.cloud.dataflow.sdk.coders.KvCoder.registerByteSizeObserver(KvCoder.java:156)     在   com.google.cloud.dataflow.sdk.coders.KvCoder.registerByteSizeObserver(KvCoder.java:42)     在   com.google.cloud.dataflow.sdk.util.WindowedValue $ FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:641)     在   com.google.cloud.dataflow.sdk.util.WindowedValue $ FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:552)     在   com.google.cloud.dataflow.sdk.runners.worker.MapTaskExecutorFactory $ ElementByteSizeObservableCoder.registerByteSizeObserver(MapTaskExecutorFactory.java:351)     在   com.google.cloud.dataflow.sdk.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:61)     在   com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:46)     在   com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn $ 1.输出(SimpleParDoFn.java:158)     在   com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)     在   com.google.cloud.dataflow.sdk.util.DoFnRunnerBase $ DoFnProcessContext.output(DoFnRunnerBase.java:450)     在   com.google.cloud.dataflow.sdk.transforms.MapElements $ 1.processElement(MapElements.java:109)     在   com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)     在   com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)     在   com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)     在   com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)     在   com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)     在   com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)     在   com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:221)     在   com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)     在   com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:285)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:171)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.doWork(DataflowWorkerHarness.java:192)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.call(DataflowWorkerHarness.java:172)     在   com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.call(DataflowWorkerHarness.java:159)     在java.util.concurrent.FutureTask.run(FutureTask.java:266)at   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

错误与我在this question (mine)上看到的相同。

你可能想要仔细检查你是否正在使用KV,但是你不小心为K分配了NULL值。(如果一个CSV文件的某个空值与你想用作K的字段有关,则可能很容易发生)