使用Spark hadoop API创建RDD以访问Cassandra DB

时间:2014-01-04 12:07:25

标签: scala hadoop cassandra cassandra-2.0 apache-spark

我正在运行一个节点cassandra 2.0.3和Apache Spark 2.0.3

我创建了一个scala程序来使用Spark hadoop API创建一个RDD来访问Cassandra DB。

另外应该在bashrc中为spaark设置什么环境变量,因为我在spark-env.sh中使用下面的配置

export SPARK_MASTER_IP="10.0.3.15"

export SPARK_MASTER_PORT="7077"

export SCALA_HOME="/home/Desktop/CD/scala-2.9.3"

export SPARK_WORKER_MEMORY=1g

export SPARK_WORKER_INSTANCES=1

export SPARK_WORKER_DIR="/home/Desktop/CD/spark-0.8.0-incubating/sparkdata"

我的示例scala代码如下

            val job=new Job()
    job.setInputFormatClass(classOf[ColumnFamilyInputFormat])
    val host: String = "localhost"
            val port: String = "9160"
    ConfigHelper.setInputInitialAddress(job.getConfiguration(), host)
        ConfigHelper.setInputRpcPort(job.getConfiguration(), port)
    ConfigHelper.setInputColumnFamily(job.getConfiguration(), "demodb", "emp")
    ConfigHelper.setInputPartitioner(job.getConfiguration(), "Murmur3Partitioner")
    CqlConfigHelper.setInputColumns(job.getConfiguration(), "empid,deptid,first_name,last_name")
            CqlConfigHelper.setInputWhereClauses(job.getConfiguration(),"empid=104")

    // Make a new Hadoop RDD
    val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
                            classOf[CqlPagingInputFormat],
                            classOf[Map[String, ByteBuffer]],
                            classOf[Map[String, ByteBuffer]])

    println(casRdd.count())

但是,当我在Spark Master上运行此作业时,它无法完成作业并提供以下日志。

14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException
java.lang.RuntimeException
    at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:661)
    at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:297)
    at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:163)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:86)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:74)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
    at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 12 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 2 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 1]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 13 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 1 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 2]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 14 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 5 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 3]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 15 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 11 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 4]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 16 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 8 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 5]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 17 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 3 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 6]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 18 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 6 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 7]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 19 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 9 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 8]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 20 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 7 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 9]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 21 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 10 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 10]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 22 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 4 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 11]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 23 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 12 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 12]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 24 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 13 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 13]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 25 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 14 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 14]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 26 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 15 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 15]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 27 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 16 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 16]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 28 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 17 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 17]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 29 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 18 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 18]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 30 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 19 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 19]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 31 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 20 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 20]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 32 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 21 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 21]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 33 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 22 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 22]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 34 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 23 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 23]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 35 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 24 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 24]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 36 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 25 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 25]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 37 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 26 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 26]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 38 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 27 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 27]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 39 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 29 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 28]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 40 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 28 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 29]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 41 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 30 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 30]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 42 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 31 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 31]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 43 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 32 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 32]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 44 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 33 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 33]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 45 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 34 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 34]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 46 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 35 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 35]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 47 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 36 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 36]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 48 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 37 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 37]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 49 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 38 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 38]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 50 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 39 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 39]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 51 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 40 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 40]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 52 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 41 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 41]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 53 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 42 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 42]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 54 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 43 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 43]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 55 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 44 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 44]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 56 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 45 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 45]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 57 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 46 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 46]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 58 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 47 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 47]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 59 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 48 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 48]
14/01/03 16:34:16 ERROR cluster.ClusterTaskSetManager: Task 0.0:0 failed more than 4 times; aborting job
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Remove TaskSet 0.0 from pool 
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 51 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 49 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 50 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 52 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 53 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 54 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 52 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 51 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 55 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 53 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 56 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 54 because its task set is gone
14/01/03 16:34:16 INFO scheduler.DAGScheduler: Failed to run count at myown.scala:75
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 57 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 55 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 56 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 58 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 59 because its task set is gone
[error] (run-main) org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times
org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
    at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 59 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 57 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 58 because its task set is gone

所以基本上我很困惑并且很难解决这个问题因为我不明白我的scala代码或者spark主从通信或者spark环境配置是否有问题。

请求指导我。

2 个答案:

答案 0 :(得分:2)

这个答案可能会帮助您实施:Spark with Cassandra input/output

Datastax宣布了Spark的官方Cassandra驱动程序。使用此解决方案,您无需实现Hadoop接口,因为这是Cassandra和Spark之间的直接桥梁。

答案 1 :(得分:0)

正如您所看到的那样here这是因为Cassandra的CQL RecordReader中抛出了一般的未处理异常。

尝试在启用调试日志的情况下运行Spark,您应该得到导致此问题的错误。

我的猜测问题在于您尝试在jobConfiguration中设置ColumnFamilyInputFormat并将classOf [CqlPagingInputFormat]传递给newHadoopApiRdd方法。但是htat只是猜测。

虽然we目前仅针对Cassandra 1.2.12发布了Calliope,但我们的客户已成功使用它与Cassandra 2.x无任何变化。 Calliope为您提供了一个更简单,更高级别的API,可以与Spark的Cassandra一起使用。我建议你试一试。