Question

我有一个cassandra3集群，我决定使用spark1.6（prebuild with hadoop2.6）来分析存储的数据。我想用python实现一些功能。为此，我首先运行以下命令：

./bin/spark-submit examples/src/main/python/cassandra_inputformat.py 192.168.100.251 test-keyspace test-table

有了这个命令，我遇到了 java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat。

为了解决这个问题，我跑了：

./bin/spark-submit --jars lib/spark-examples-1.6.0-hadoop2.6.0.jar examples/src/main/python/cassandra_inputformat.py 192.168.100.251 test-keyspace test-table

然后我得到错误java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected。

经过一番搜索后，我得知错误是因为hadoop1 jar可以提供示例，但运行它们需要hadoop2！

所以，我尝试了这些解决方案：

我尝试使用spark2.6 prebuild hadoop1版本，这让我解决了hadoop1与cassandra3不兼容的问题。

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.RuntimeException: InvalidRequestException(why:unconfigured table schema_columnfamilies)

我尝试使用spark-cassandra-connector，这仍然导致我与cassandra3不兼容，错误类似于上述错误，表示未配置表schema_columnfamilies。

将火花连接到Cassandra3

0 个答案: