数据流作业模板创建错误

时间:2017-02-01 09:44:54

标签: java google-cloud-platform google-cloud-dataflow

我想创建一个数据流作业模板,该模板在GCS中获取文件名并将其发布到PubSub主题。我按照this链接的教程进行了操作,但这对我来说似乎没有用。

我的课程定义如下 -

import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.io.PubsubIO;
import com.google.cloud.dataflow.sdk.io.TextIO;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.runners.TemplatingDataflowPipelineRunner;

public class PubSubOutputTest {
  public static void main(String[] args) {
    // Create pipeline options.
    pubSubOutputOptions options = PipelineOptionsFactory.fromArgs(args).as(pubSubOutputOptions.class);
    options.setRunner(TemplatingDataflowPipelineRunner.class);
    options.setTempLocation("gs://staging-bucket");
    Pipeline p = Pipeline.create(options);
    // Read the file from the GCS Bucket.
    p.apply(TextIO.Read.named("Read file from GCS.").from(options.getInputFile()).withoutValidation())
        .apply(PubsubIO.Write.named("Write to Pub Sub topic.")
        .topic("projects/my-project/topics/my-topic"));
    // Run the pipeline.
    p.run();
  }
}

实现ValueProvider以获取运行时输入的接口如下 -

import com.google.cloud.dataflow.sdk.options.Default;
import com.google.cloud.dataflow.sdk.options.PipelineOptions;
import com.google.cloud.dataflow.sdk.options.ValueProvider;

public interface pubSubOutputOptions extends PipelineOptions {
  @Default.String("gs://default-file.txt")
  ValueProvider getInputFile();
  void setInputFile(ValueProvider value);
}

模板创建出现以下错误。

Exception in thread "main" java.lang.IllegalArgumentException: PipelineOptions specified failed to serialize to JSON.
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:408)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:146)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:570)
    at com.google.cloud.dataflow.sdk.runners.TemplatingDataflowPipelineRunner.run(TemplatingDataflowPipelineRunner.java:137)
    at com.google.cloud.dataflow.sdk.runners.TemplatingDataflowPipelineRunner.run(TemplatingDataflowPipelineRunner.java:44)
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
    at com.my.project.dataflow.PubSubOutputTest.main(PubSubOutputTest.java:32)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'inputFile' with value 'RuntimeValueProvider{propertyName=inputFile, default=gs://default-file.txt, value=null}'
    at com.fasterxml.jackson.databind.JsonMappingException.fromUnexpectedIOE(JsonMappingException.java:284)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsBytes(ObjectMapper.java:3008)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:406)

我是Google Cloud Dataflow和Java的新手。我在文档中实现了所有内容,但我可能错过了一些明显的东西。

1 个答案:

答案 0 :(得分:2)

您似乎在宣布该选项时出现错误。我想你想为ValueProvider提供一个模板参数,如下所示:

  @Default.String("gs://default-file.txt")
  ValueProvider<String> getInputFile();
  void setInputFile(ValueProvider<String> value);