从Pyspark阅读Cassandra表时遇到NoSuchMethod异常

时间:2016-05-02 20:41:17

标签: apache-spark apache-spark-sql cassandra-2.0 spark-cassandra-connector

我正在尝试从Pyspark中的Cassandra键空间读取数据。

这是我的代码:

from pyspark import SparkContext                       
from pyspark import SparkConf
from pyspark.sql import SQLContext
conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Cassandra")
conf.set("spark.cassandra.connection.host","127.0.0.1")
sqlContext.read\                                       
    .format("org.apache.spark.sql.cassandra")\
    .options(table="kv", keyspace="tutorialspoint")\
    .load().show()

我在CentOS 6.7 VM,Spark 1.5,Hadoop 2.6.0,Cassandra 2.1.13上运行它

使用以下命令启动pyspark控制台:

pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2

尝试为不同版本的cassandra-connector包启动pyspark控制台,但这没有帮助。

以下是我在执行read时在控制台中遇到的错误消息:

  

Py4JJavaError:调用o29.load时发生错误。   :java.lang.NoSuchMethodError:com.google.common.reflect.TypeToken.isPrimitive()Z       在com.datastax.driver.core.TypeCodec。(TypeCodec.java:142)       在com.datastax.driver.core.TypeCodec。(TypeCodec.java:136)       at com.datastax.driver.core.TypeCodec $ BlobCodec。(TypeCodec.java:609)       在com.datastax.driver.core.TypeCodec $ BlobCodec。(TypeCodec.java:606)       在com.datastax.driver.core.CodecRegistry。(CodecRegistry.java:147)       在com.datastax.driver.core.Configuration $ Builder.build(Configuration.java:259)       在com.datastax.driver.core.Cluster $ Builder.getConfiguration(Cluster.java:1135)       在com.datastax.driver.core.Cluster。(Cluster.java:111)       在com.datastax.driver.core.Cluster.buildFrom(Cluster.java:178)       在com.datastax.driver.core.Cluster $ Builder.build(Cluster.java:1152)       在com.datastax.spark.connector.cql.DefaultConnectionFactory $ .createCluster(CassandraConnectionFactory.scala:85)       at com.datastax.spark.connector.cql.CassandraConnector $ .com $ datastax $ spark $ connector $ cql $ CassandraConnector $$ createSession(CassandraConnector.scala:155)       在com.datastax.spark.connector.cql.CassandraConnector $$ anonfun $ 2.apply(CassandraConnector.scala:150)       在com.datastax.spark.connector.cql.CassandraConnector $$ anonfun $ 2.apply(CassandraConnector.scala:150)       在com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)       在com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)       在com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)       在com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)       在com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:120)       在com.datastax.spark.connector.cql.Schema $ .fromCassandra(Schema.scala:241)       在org.apache.spark.sql.cassandra.CassandraSourceRelation。(CassandraSourceRelation.scala:47)       在org.apache.spark.sql.cassandra.CassandraSourceRelation $ .apply(CassandraSourceRelation.scala:184)       在org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)       在org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .apply(ResolvedDataSource.scala:125)       在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)       一个

1 个答案:

答案 0 :(得分:1)

这是因为Guava版本冲突。 Spark Cassandra Connector和Hadoop使用不同的Guava版本。请参阅https://datastax-oss.atlassian.net/browse/SPARKC-365以及要修复的PR:https://github.com/datastax/spark-cassandra-connector/pull/968