我试图将Java中的Apache Beam用作各种数据管道。我写了一个简单的类,来自谷歌Pubsub,并下沉到谷歌Bigquery,但我不能让它为我的生活建立。我正在使用Maven构建并添加了我能找到的每个Beam包,但我仍然得到"没有找到类文件"错误。
具体做法是:
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[28,16] cannot access org.apache.beam.sdk.options.GcpOptions
class file for org.apache.beam.sdk.options.GcpOptions not found
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[29,16] cannot access org.apache.beam.sdk.options.BigQueryOptions
class file for org.apache.beam.sdk.options.BigQueryOptions not found
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[31,16] cannot access org.apache.beam.sdk.options.GcsOptions
class file for org.apache.beam.sdk.options.GcsOptions not found
有谁知道我需要添加哪些软件包才能解决这些问题?遗憾的是谷歌没有帮助。
我所拥有的POM文件基于Apache为Wordcount提供的示例POM,但添加了额外的依赖项。以下是我添加的依赖项。如果需要,我可以提供完整的文件,但它非常单一。
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-apex</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<!--
Apex depends on httpclient version 4.3.5, project has a transitive dependency to httpclient 4.0.1 from
google-http-client. Apex dependency version being specified explicitly so that it gets picked up. This
can be removed when the project no longer has a dependency on a different httpclient version.
-->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.3.5</version>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</profile>
<profile>
<id>dataflow-runner</id>
<!-- Makes the DataflowRunner available when running a pipeline. -->
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>flink-runner</id>
<!-- Makes the FlinkRunner available when running a pipeline. -->
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-flink_2.10</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>spark-runner</id>
<!-- Makes the SparkRunner available when running a pipeline. Additionally,
overrides some Spark dependencies to Beam-compatible versions. -->
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-spark</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-hadoop-file-system</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>${spark.version}</version>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.10</artifactId>
<version>${jackson.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
</profiles>
<dependencies>
<!-- Adds a dependency on the Beam SDK. -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-google-cloud-platform -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-fn-api -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-common-fn-api</artifactId>
<version>2.2.0</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-google-cloud-platform-core -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-common -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-common</artifactId>
<version>2.2.0</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-parent -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-gcp-parent -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-gcp-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-parent -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-extensions-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-parent -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-parent -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-common-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-parent -->
<dependency>
<groupId>com.google.cloud.dataflow</groupId>
<artifactId>google-cloud-dataflow-java-sdk-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-reference -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-reference</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-parent -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-parent</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-build-tools -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-build-tools</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-direct-java -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-direct-java</artifactId>
<version>2.2.0</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-core-construction-java -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-core-construction-java</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud.dataflow</groupId>
<artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
<version>[2.1.0, 2.99)</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-runner-api -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-common-runner-api</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>0.4.0</version>
</dependency>
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>${google-clients.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>${bigquery.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client</artifactId>
<version>${google-clients.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-pubsub</artifactId>
<version>${pubsub.version}</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>${joda.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
</dependency>
<!-- Add slf4j API frontend binding with JUL backend -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jdk14</artifactId>
<version>${slf4j.version}</version>
<!-- When loaded at runtime this will wire up slf4j to the JUL backend -->
<scope>runtime</scope>
</dependency>
<!-- Hamcrest and JUnit are required dependencies of PAssert,
which is used in the main code of DebuggingWordCount example. -->
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-all</artifactId>
<version>${hamcrest.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
</dependency>
</dependencies>
答案 0 :(得分:1)
这些课程:
org.apache.beam.sdk.options.GcpOptions
org.apache.beam.sdk.options.GcsOptions
org.apache.beam.sdk.options.BigQueryOptions
...全都在Apache Beam的an earlier version。
鉴于pom.xml
中的依赖关系(特别是Apache Beam的v2.2.0的依赖关系),正确的导入是:
org.apache.beam.sdk.extensions.gcp.options.GcpOptions
org.apache.beam.sdk.extensions.gcp.options.GcsOptions
org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions