使用Eclipse在Dataflow上运行WordCount示例管道时出错

时间:2018-04-06 11:49:58

标签: java eclipse google-cloud-dataflow

尝试在Eclipse IDE下使用Dataflow运行WordCount示例管道时,出现以下错误:

Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233)
    at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:162)
    at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
    at org.apache.beam.sdk.Pipeline.create(Pipeline.java:150)
    at com.google.cloud.dataflow.examples.WordCount.main(WordCount.java:178)

Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222)
    ... 4 more

Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://mysite-ga-datastreaming-196008-my-bucket/', did you mean: 'gs://some-bucket/mysite-ga-datastreaming-196008-my-bucket'?
    at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:383)
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPath(GcsPathValidator.java:77)
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.validateOutputFilePrefixSupported(GcsPathValidator.java:60)
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:246)
    ... 9 more

有些人认为错误是由Java版本引起的,因为看起来Beam在Java 9上运行不正常。无论如何,我还在使用Java 8.另一方面,其他一些人说导致错误的原因是您必须在存储桶下提供一个子文件夹作为存储位置。我试过了,但它仍然不起作用。

如果有人在此之前遇到过同样的问题,或者可以就错误提供任何建议,我们将不胜感激。

2 个答案:

答案 0 :(得分:0)

您应该在使用管道之前在Google云端存储中创建存储分区gs://mysite-ga-datastreaming-196008-my-bucket/

答案 1 :(得分:0)

嗨曼古的建议是正确的。您仅需要为云存储登台位置分配一个文件夹而不是存储桶名称。

有关所有详细信息,请参阅我的信息:link