我使用Datastax Cassandra java驱动程序从spark worker写入Cassandra。代码段
rdd.foreachPartition(record => {
val cluster = SimpleApp.connect_cluster(Spark.cassandraip)
val session = cluster.connect()
record.foreach { case (bin_key: (Int, Int), kpi_map_seq: Iterable[Map[String, String]]) => {
kpi_map_seq.foreach { kpi_map: Map[String, String] => {
update_tables(session, bin_key, kpi_map)
}
}
}
} //record.foreach
session.close()
cluster.close()
}
在阅读时我正在使用spark cassandra连接器(我在内部使用相同的驱动程序)
val bin_table = javaFunctions(Spark.sc).cassandraTable("keyspace", "bin_1")
.select("bin").where("cell = ?", cellname) // assuming this will run on worker nodes
println(s"get_bins_for_cell Count of Bins for Cell $cellname is ", cell_bin_table.count())
return bin_table
每次执行此操作不会导致任何问题。一起做就是抛弃这个堆栈跟踪。
我的主要目标是不直接从Spark驱动程序进行写入或读取。似乎它仍然需要对上下文做些什么;两个上下文被使用?
16/07/06 06:21:29 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 22, euca-10-254-179-202.eucalyptus.internal): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1222)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
答案 0 :(得分:0)
使用Cassandra会话后,Spark Context正在关闭,如下所示
实施例
def update_table_using_cassandra_driver() ={
CassandraConnector(SparkWriter.conf).withSessionDo { session =>
val statement_4: Statement = QueryBuilder.insertInto("keyspace", "table")
.value("bin", my_tuple_value)
.value("cell", my_val("CName"))
session.executeAsync(statement_4)
...
}
所以下次我在循环中调用它时会遇到异常。看起来像Cassandra驱动程序中的错误;必须检查这个。暂时做了以下工作来解决这个问题
for(a <- 1 to 1000) {
val sc = new SparkContext(SparkWriter.conf)
update_table_using_cassandra_driver()
sc.stop()
...sleep(xxx)
}