从Spark工作人员读取和写入Cassandra会引发错误

时间:2016-07-05 16:08:46

标签: scala apache-spark datastax-java-driver spark-cassandra-connector

我使用Datastax Cassandra java驱动程序从spark worker写入Cassandra。代码段

   rdd.foreachPartition(record => {
      val cluster = SimpleApp.connect_cluster(Spark.cassandraip)
      val session = cluster.connect()
      record.foreach { case (bin_key: (Int, Int), kpi_map_seq: Iterable[Map[String, String]]) => {
        kpi_map_seq.foreach { kpi_map: Map[String, String] => {
          update_tables(session, bin_key, kpi_map)
        }
        }
      }
      } //record.foreach
      session.close()
      cluster.close()
    }

在阅读时我正在使用spark cassandra连接器(我在内部使用相同的驱动程序)

   val bin_table = javaFunctions(Spark.sc).cassandraTable("keyspace", "bin_1")
      .select("bin").where("cell = ?", cellname) // assuming this will run on worker nodes
    println(s"get_bins_for_cell Count of Bins for Cell $cellname is ", cell_bin_table.count())
    return bin_table

每次执行此操作不会导致任何问题。一起做就是抛弃这个堆栈跟踪。

我的主要目标是不直接从Spark驱动程序进行写入或读取。似乎它仍然需要对上下文做些什么;两个上下文被使用?

16/07/06 06:21:29 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 22, euca-10-254-179-202.eucalyptus.internal): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1222)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

使用Cassandra会话后,Spark Context正在关闭,如下所示

实施例

def update_table_using_cassandra_driver() ={
 CassandraConnector(SparkWriter.conf).withSessionDo { session =>
 val statement_4: Statement = QueryBuilder.insertInto("keyspace", "table")
          .value("bin", my_tuple_value)
          .value("cell", my_val("CName"))
  session.executeAsync(statement_4)
  ...
}

所以下次我在循环中调用它时会遇到异常。看起来像Cassandra驱动程序中的错误;必须检查这个。暂时做了以下工作来解决这个问题

 for(a <- 1 to 1000) {
  val sc = new SparkContext(SparkWriter.conf)
  update_table_using_cassandra_driver()
  sc.stop()
  ...sleep(xxx)
 }