使用Bigquery连接器时出错

时间:2018-05-04 16:06:47

标签: scala apache-spark google-bigquery google-cloud-dataproc

在Qubole数据平台上运行Spotify Spark Bigquery连接器时出现此错误。我确实在我的jar中看到了BigQueryUtils类但仍然抛出了这个错误:

  

线程中的异常" main"   org.spark-project.guava.util.concurrent.ExecutionError:   java.lang.NoSuchMethodError:   com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion

在下方安装pom ......

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.xyz.abc.google.TestProject</groupId>
  <artifactId>edesem-google-TestProject</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

    <gpg.skip>true</gpg.skip>

    <!-- Keep in sync with google-api-client dependency -->
    <apache.httpcomponents.version>4.0.1</apache.httpcomponents.version>
  </properties>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.3.1</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.5.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>

      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>

      <!-- Maven Shade Plugin -->
      <plugin>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <finalName>edesem-google-TestProject</finalName>
              <shadedArtifactAttached>false</shadedArtifactAttached>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                  <resource>reference.conf</resource>
                </transformer>
                <transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
                  <resource>log4j.properties</resource>
                </transformer>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>com.xyz.abc.bigquery.TestProjectBQClient</mainClass>
                </transformer>
              </transformers>
              <relocations>
                <relocation>
                  <pattern>org.eclipse.jetty</pattern>
                  <shadedPattern>org.spark-project.jetty</shadedPattern>
                  <includes>
                    <include>org.eclipse.jetty.**</include>
                  </includes>
                </relocation>
                <relocation>
                  <pattern>com.google.common</pattern>
                  <shadedPattern>org.spark-project.guava</shadedPattern>
                  <excludes>
                    <exclude>com/google/common/base/Absent*</exclude>
                    <exclude>com/google/common/base/Function</exclude>
                    <exclude>com/google/common/base/Optional*</exclude>
                    <exclude>com/google/common/base/Present*</exclude>
                    <exclude>com/google/common/base/Supplier</exclude>
                  </excludes>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.10.6</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>2.2.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.10</artifactId>
      <version>2.2.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>spark-avro_2.10</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>bigquery-connector</artifactId>
      <version>0.10.2-hadoop2</version>
      <exclusions>
        <exclusion>
          <groupId>com.google.guava</groupId>
          <artifactId>guava-jdk5</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-simple</artifactId>
      <version>1.7.21</version>
    </dependency>
    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <version>2.9.3</version>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_2.10</artifactId>
      <version>2.2.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>gcs-connector</artifactId>
      <version>1.8.0-hadoop2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/util-hadoop -->
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>util-hadoop</artifactId>
      <version>1.8.0-hadoop2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcsio -->
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>gcsio</artifactId>
      <version>1.8.0</version>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>util</artifactId>
      <version>1.8.0</version>
      <exclusions>
        <exclusion>
          <groupId>com.google.api-client</groupId>
          <artifactId>google-api-client-java6</artifactId>
        </exclusion>
        <exclusion>
          <groupId>com.google.guava</groupId>
          <artifactId>guava-jdk5</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.8.3</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.8.3</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>23.6-jre</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery -->
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-bigquery</artifactId>
      <version>1.23.0</version>
    </dependency>
  </dependencies>
</project>

2 个答案:

答案 0 :(得分:1)

我认为我的主要问题是群集中的大查询连接器配置。我将jar添加到类路径中并修复了问题。根据Google的文档提供以下说明。

https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#add-the-connector-jar-to-hadoops-classpath

将连接器jar添加到Hadoop的类路径中 将连接器jar放在Hadoop安装的相应子目录中可能会使Hadoop加载jar有效。但是,要确保加载了jar,请在Hadoop配置目录中将HADOOP_CLASSPATH=$HADOOP_CLASSPATH:</path/to/gcs-connector-jar>添加到hadoop-env.sh

答案 1 :(得分:0)

这是因为您使用的com.google.cloud.bigdataoss:bigquery-connector:0.10.2-hadoop2 BigQuery连接器版本与com.google.cloud:google-cloud-bigquery:1.23.0库版本不兼容。

您需要将com.google.cloud.bigdataoss:bigquery-connector升级到至少0.11.0版本,并使其与其他com.google.cloud.bigdataoss依赖项的版本保持一致(在您的情况下,它将是0.12.0版本) ,即它们都应来自此处列出的同一版本:https://github.com/GoogleCloudPlatform/bigdata-interop/releases