我在运行从g3读取的作业时遇到以下异常,然后按键对数据进行分组。 读取期间发生异常。
java.io.IOException:INVALID_ARGUMENT:无法在com.google.cloud.dataflow.sdk.runners的com.google.cloud.dataflow.sdk.runners.worker.ApplianceShuffleWriter.write(本机方法)中解析密钥。 worker.ShuffleSink $ ShuffleSinkWriter.outputChunk(ShuffleSink.java:293)at at com.google.cloud.dataflow.sdk.runners.worker.ShuffleSink $ ShuffleSinkWriter.close(ShuffleSink.java:288)at at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)at at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:79)at at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:288)at at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221)at at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)at at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.doWork(DataflowWorkerHarness.java:193)at at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.call(DataflowWorkerHarness.java:173)at at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness $ WorkerThread.call(DataflowWorkerHarness.java:160)at at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)at at java.lang.Thread.run(Thread.java:745)
有什么想法吗?
答案 0 :(得分:0)
当您尝试应用GroupByKey
时会抛出此异常,但某些映射的键为空。
此代码抛出异常:
pCollection
.apply(ParDo.of(new DoFn<KV<MyObject, MyObject>, Object>() {
@Override
public void processElement(ProcessContext c) throws Exception {
c.output(KV.of(null, c.element()));
}
}))
.apply(GroupByKey.<String, Statusable>create())
您无法写入null键。 因此,当您的密钥可以为空时,您必须执行以下操作:
pCollection
.apply(ParDo.of(new DoFn<KV<MyObject, MyObject>, Object>() {
@Override
public void processElement(ProcessContext c) throws Exception {
String key == c.element().getKeyField();
if (key == null){
// Handle some how....
key = ... // not null value
}
c.output(KV.of(key, c.element()));
}
}))