将火花连接到Cassandra3

时间:2016-02-09 10:02:14

标签: python apache-spark cassandra

我有一个cassandra3集群,我决定使用spark1.6(prebuild with hadoop2.6)来分析存储的数据。 我想用python实现一些功能。为此,我首先运行以下命令:

./bin/spark-submit examples/src/main/python/cassandra_inputformat.py 192.168.100.251 test-keyspace test-table

有了这个命令,我遇到了 java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat

为了解决这个问题,我跑了:

./bin/spark-submit --jars lib/spark-examples-1.6.0-hadoop2.6.0.jar examples/src/main/python/cassandra_inputformat.py 192.168.100.251 test-keyspace test-table

然后我得到错误java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

经过一番搜索后,我得知错误是因为hadoop1 jar可以提供示例,但运行它们需要hadoop2!

所以,我尝试了这些解决方案:

  1. 我尝试使用spark2.6 prebuild hadoop1版本,这让我解决了hadoop1与cassandra3不兼容的问题。

    py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.RuntimeException: InvalidRequestException(why:unconfigured table schema_columnfamilies)
    
  2. 我尝试使用spark-cassandra-connector,这仍然导致我与cassandra3不兼容,错误类似于上述错误,表示未配置表schema_columnfamilies。

0 个答案:

没有答案