java.io.IOException:INVALID_ARGUMENT:无法解析com.google.cloud.dataflow.sdk.runners.worker.ApplianceShuffleWriter.write中的密钥

时间:2016-06-16 09:11:49

标签: google-cloud-dataflow

我在运行从g3读取的作业时遇到以下异常,然后按键对数据进行分组。 读取期间发生异常。

  

java.io.IOException:INVALID_ARGUMENT:无法在com.google.cloud.dataflow.sdk.runners的com.google.cloud.dataflow.sdk.runners.worker.ApplianceShuffleWriter.write(本机方法)中解析密钥。 worker.ShuffleSink $ ShuffleSinkWriter.outputChunk(ShuffleSink.java:293)at at    com.google.cloud.dataflow.sdk.runners.worker.ShuffleSink $ ShuffleSinkWriter.close(ShuffleSink.java:288)at at    com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)at at    com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:79)at at    com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:288)at at    com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221)at at    com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)at at    com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.doWork(DataflowWorkerHarness.java:193)at at    com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.call(DataflowWorkerHarness.java:173)at at    com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.call(DataflowWorkerHarness.java:160)at at    java.util.concurrent.FutureTask.run(FutureTask.java:266)at    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at at    java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)at at    java.lang.Thread.run(Thread.java:745)

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

当您尝试应用GroupByKey时会抛出此异常,但某些映射的键为空。

此代码抛出异常:

pCollection
            .apply(ParDo.of(new DoFn<KV<MyObject, MyObject>, Object>() {
                @Override
                public void processElement(ProcessContext c) throws Exception {
                    c.output(KV.of(null, c.element()));
                }
            }))
            .apply(GroupByKey.<String, Statusable>create())

您无法写入null键。 因此,当您的密钥可以为空时,您必须执行以下操作:

pCollection
            .apply(ParDo.of(new DoFn<KV<MyObject, MyObject>, Object>() {
                @Override
                public void processElement(ProcessContext c) throws Exception {
                    String key == c.element().getKeyField();
                    if (key == null){
                        // Handle some how....
                        key = ... // not null value

                    }
                    c.output(KV.of(key, c.element()));
                }
            }))