EMR 5.0 + spark 2.0 + Cassandra连接器:尝试连接到Cassandra时出现空指针异常

时间:2016-09-05 13:01:00

标签: apache-spark cassandra spark-streaming spark-cassandra-connector

我尝试在连接到Cassandra的EMR 5上部署Spark 2.0(流媒体)应用程序。 我使用的Spark-Cassandra连接器是: “com.datastax.spark”%“spark-cassandra-connector_2.11”%“2.0.0-M3”。

该应用程序在我的计算机上独立运行,并成功连接到Cassandra(保存数据)。所有相关的Cassandra端口似乎都在群集中打开。 但我仍然有最低限度的例外。

下面是函数“getCassandraMappedTable”。

class VisitDaoImpl {
    override def getCassandraMappedTable():CassandraTableScanRDD[Visit] = {
        SparkContextHolder.sparkContext.cassandraTable[Visit](keyspace, tableName)
     }
}

相关的访问:

case class Visit(val visitorKey:String, val normalizedDomain:String, val timestamp:Date, val visitId:String, val batchId:Long) extends Serializable



object Visit extends CassandraTable {
  import Visit.Columns._

  implicit object Mapper extends DefaultColumnMapper[Visit](
    Map("visitorKey" -> VISITOR_KEY,
      "normalizedDomain" -> NORMALIZED_DOMAIN,
      "timestamp" -> TIMESTAMP,
      "visitId" -> VISIT_ID))

  val TABLE_NAME = "visit"

  case object Columns {
    val VISITOR_KEY = "visitor_key"
    val NORMALIZED_DOMAIN = "normalized_domain"
    val TIMESTAMP = "timestamp"
    val VISIT_ID = "visit_id"
  }

  val columnsNames:Seq[ColumnName] = toColumnNames(Columns)

}

我没有合理的理由得到以下异常:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1069.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1069.0 (TID 721, ip-10-0-0-111.eu-west-1.compute.internal): java.lang.NullPointerException
at com.datastax.spark.connector.SparkContextFunctions.cassandraTable$default$3(SparkContextFunctions.scala:52)
at com.naturalint.myproject.daoimpl.VisitDaoImpl.getCassandraMappedTable(VisitDaoImpl.scala:24)
at com.naturalint.myproject.daoimpl.VisitDaoImplVisitDaoImpl.findLatestBetween(VisitDaoImpl.scala:92)
at com.naturalint.myproject.servicesimpl.MyAlgo$$anonfun$processStream$1$$anonfun$apply$2.apply(MyAlgo.scala:122)
at com.naturalint.myproject.servicesimpl.MyAlgo$$anonfun$processStream$1$$anonfun$apply$2.apply(MyAlgo.scala:110)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:26)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:875)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

有什么想法吗?

谢谢, 叶兰。

0 个答案:

没有答案