我能够连接到火花就好了
library(sparklyr)
sc <- spark_connect(master = "yarn-client", spark_home = "/usr/lib/spark2")
我可以在Rstudio的连接窗口中看到default
数据库中的所有表格,但是,我的所有命令都失败并出现Failed to get broadcast_0_piece0 of broadcast_0
行的错误。
请注意,我可以运行tbl(sc, "tblname") %>% colnames
之类的命令。任何涉及数据操作的内容都不会运行,例如tbl(sc, "tblname") %>% sample_n(20)
。
关于我如何调试这个的任何想法?
以下是我尝试运行的代码:
library(sparklyr)
sc <- spark_connect(master = "yarn-client", spark_home = "/usr/lib/spark", version = 1.6)
t <- tbl(sc, "tbl_name")
t %>% sample_n(20) %>% head
以下是火花日志中显示的错误
18/04/20 17:38:28 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
18/04/20 17:38:28 ERROR sparklyr: Backend (24866) failed calling collect on sparklyr.Utils
这是完整的日志:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, bigdataworker11.host.bo1.csnzoo.com, executor 1): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_2_piece0 of broadcast_2
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1214)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:167)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_2_piece0 of broadcast_2
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:140)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:140)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:139)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:121)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1207)
... 11 more