检测到番石榴问题#1635,表明正在使用少于16.01的番石榴版本

时间:2016-04-26 23:58:43

标签: apache-spark guava datastax emr

我在emr上运行spark job并使用datastax连接器连接到cassandra集群。我正面临着番石榴罐的问题,请查看下面的详细信息 我在cassandra deps下面使用

cqlsh 5.0.1 | Cassandra 3.0.1 | CQL spec 3.3.1 

使用以下maven deps在EMR 4.4上运行spark工作

    org.apache.spark     火花streaming_2.10     1.5.0

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.0</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId><dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.10</artifactId>
    <version>1.5.0</version>
</dependency>

    <artifactId>spark-streaming-kinesis-asl_2.10</artifactId>
    <version>1.5.0</version>
</dependency>

当我提交火花作业时遇到问题

ava.lang.ExceptionInInitializerError
       at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:35)
       at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:87)
       at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153)
       at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
       at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
       at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
      at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
       at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
       at ampush.event.process.core.CassandraServiceManagerImpl.getAdMetaInfo(CassandraServiceManagerImpl.java:158)
       at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:308)
       at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:290)
       at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)
       at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)
       at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
       at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
       at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
       at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
       at org.apache.spark.scheduler.Task.run(Task.scala:88)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use.  This introduces codec resolution issues and potentially other incompatibility issues in the driver.  Please upgrade to Guava 16.01 or later.
       at com.datastax.driver.core.SanityChecks.checkGuava(SanityChecks.java:62)
       at com.datastax.driver.core.SanityChecks.check(SanityChecks.java:36)
       at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:67)
       ... 23 more

请告诉我如何管理番石榴派?

由于

6 个答案:

答案 0 :(得分:7)

另一种解决方案,转到目录

  

火花/罐

。重命名guava-14.0.1.jar然后复制guava-19.0.jar,如下图所示:

enter image description here

答案 1 :(得分:2)

只需添加POM的<dependencies>块,就像这样:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>19.0</version>
</dependency>

(或您喜欢的任何版本&gt; 16.0.1)

答案 2 :(得分:2)

我遇到了同样的问题,并通过使用maven Shade插件来遮蔽Cassandra连接器带来的番石榴版本来解决它。

我需要明确地排除Optional,Present和Absent类,因为我遇到了Spark试图从非阴影Guava Present类型转换为带阴影的Optional类型的问题。我不确定这是否会在以后引起任何问题,但现在似乎对我有用。

您可以将其添加到pom.xml中的<plugins>部分:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.4.3</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>
                    shade
                </goal>
            </goals>
        </execution>
    </executions>

    <configuration>
        <minimizeJar>true</minimizeJar>
        <shadedArtifactAttached>true</shadedArtifactAttached>
        <shadedClassifierName>fat</shadedClassifierName>

        <relocations>
            <relocation>
                <pattern>com.google</pattern>
                <shadedPattern>shaded.guava</shadedPattern>
                <includes>
                    <include>com.google.**</include>
                </includes>

                <excludes>
                    <exclude>com.google.common.base.Optional</exclude>
                    <exclude>com.google.common.base.Absent</exclude>
                    <exclude>com.google.common.base.Present</exclude>
                </excludes>
            </relocation>
        </relocations>

        <filters>
            <filter>
                <artifact>*:*</artifact>
                <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                </excludes>
            </filter>
        </filters>

    </configuration>
</plugin>

答案 3 :(得分:1)

我能够通过外部添加guava 16.0.1 jar来解决这个问题,然后在下面的配置值的帮助下指定Spark提交的类路径:

- conf“spark.driver.extraClassPath = / guava-16.0.1.jar” --conf“spark.executor.extraClassPath = / guava-16.0.1.jar”

希望这可以帮助有类似错误的人!

答案 4 :(得分:1)

我在Spark提交时使用Spark(java)从Cassandra表中检索记录时遇到了同样的问题。

请使用查找命令检查群集中Hadoop和Spark使用的 guava jar版本并相应更改。

find / -name "guav*.jar"

否则,在驱动程序和执行者 spark.driver.extraClassPath spark.executor spark-submit 期间,在外部添加 guava jar。 extraClassPath 分别。

spark-submit --class com.my.spark.MySparkJob --master local --conf 'spark.yarn.executor.memoryOverhead=2048' --conf 'spark.cassandra.input.consistency.level=ONE' --conf 'spark.cassandra.output.consistency.level=ONE' --conf 'spark.dynamicAllocation.enabled=false' --conf "spark.driver.extraClassPath=lib/guava-19.0.jar" --conf "spark.executor.extraClassPath=lib/guava-19.0.jar" --total-executor-cores 15 --executor-memory 15g  --jars $(echo lib/*.jar | tr ' ' ',') target/my-sparkapp.jar

它为我工作。希望你能尝试一下。

答案 5 :(得分:0)

感谢Adrian的回复。

我所处的架构与其他人不同,但番石榴问题仍然是一样的。我正在使用带有中间层的火花2.2。在我们的开发环境中,我们使用sbt-native-packager生成我们的docker镜像以传递到mesos。

事实证明,我们需要为spark提交执行程序提供不同的guava,而不是我们在驱动程序上运行的代码。这对我有用。

build.sbt

....
libraryDependencies ++= Seq(
  "com.google.guava" % "guava" % "19.0" force(),
  "org.apache.hadoop" % "hadoop-aws" % "2.7.3" excludeAll (
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-common"), //this is for s3a
    ExclusionRule(organization = "com.google.guava",  name= "guava" )),
  "org.apache.spark" %% "spark-core" % "2.1.0"   excludeAll (
    ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"),
    ExclusionRule(organization = "com.google.guava",  name= "guava" )) ,
  "com.github.scopt" %% "scopt" % "3.7.0"  excludeAll (
    ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"),
    ExclusionRule(organization = "com.google.guava",  name= "guava" )) ,
  "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.6",
...
dockerCommands ++= Seq(
...
  Cmd("RUN rm /opt/spark/dist/jars/guava-14.0.1.jar"),
  Cmd("RUN wget -q http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar  -O /opt/spark/dist/jars/guava-23.0.jar")
...

当我尝试用番石榴16.0.1或19替换执行器上的番石榴14时,它仍然不起作用。 Spark提交刚刚去世。我的胖罐子实际上是我在驱动程序中使用的番石榴我被迫19岁,但我的火花提交执行者我不得不更换为23.我确实尝试更换为16和19,但火花刚刚死亡那里也是。

很抱歉转移,但每次谷歌搜索后,每次都会出现这个问题。我希望这也有助于其他SBT / mesos人员。