public static void main(String[] args) {
//Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setRunner(DataflowRunner.class);
options.setStagingLocation("gs://bucketname/stageapache");
options.setTempLocation("gs://bucketname/stageapachetemp");
options.setProject("projectid");
Pipeline p=Pipeline.create(options);
p.apply(TextIO.read().from("gs://bucketname/filename.csv"));
//p.apply(FileIO.match().filepattern("gs://bucketname/f.csv"));
p.run();
}
pom.xml
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>2.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-google-cloud-dataflow-java -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>2.0.0</version>
</dependency>
错误
Dec 08, 2017 5:09:35 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 85 files. Enable logging at DEBUG level to see which files will be staged.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.beam.sdk.values.PCollection.createPrimitiveOutputInternal(Lorg/apache/beam/sdk/Pipeline;Lorg/apache/beam/sdk/values/WindowingStrategy;Lorg/apache/beam/sdk/values/PCollection$IsBounded;)Lorg/apache/beam/sdk/values/PCollection;
at org.apache.beam.runners.dataflow.PrimitiveParDoSingleFactory$ParDoSingle.expand(PrimitiveParDoSingleFactory.java:68)
at org.apache.beam.runners.dataflow.PrimitiveParDoSingleFactory$ParDoSingle.expand(PrimitiveParDoSingleFactory.java:58)
at org.apache.beam.sdk.Pipeline.applyReplacement(Pipeline.java:550)
at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:280)
at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:201)
at org.apache.beam.runners.dataflow.DataflowRunner.replaceTransforms(DataflowRunner.java:688)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:498)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:153)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289)
at com.pearson.apachebeam.StarterPipeline.main(StarterPipeline.java:60)
在上面的代码中,如果添加FileIO / TextIO行,我收到上述错误,如果我运行它,则添加该行是创建作业,因为它没有操作失败。我在我的开发中坚持这一点,我迁移到apache beam 2.2来控制我们从存储中读取的文件
帮助将不胜感激
由于
答案 0 :(得分:2)
问题是,您的pom.xml
取决于不同版本的Beam SDK的不同组件:beam-sdks-java-core
位于2.2.0,beam-sdks-java-io-google-cloud-platform
和beam-runners-google-cloud-dataflow-java
位于2.0 0.0。它们需要处于相同的版本。