我试图将事件中心的流数据写入Azure Databricks的blob中。一段时间后失败了。下面是错误消息。你能告诉原因吗?
val query =
streamingDataFrame
.writeStream
.format("json")
.option("path", "/mnt/blobimages/DatabricksSentimentPowerBI")
.option("checkpointLocation", "/mnt/blobimages/sample/check2")
.trigger(ProcessingTime("20 seconds"))
.start()
在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .write(FileFormatWriter.scala:224)中的在org.apache.spark.sql.execution.streaming.FileStreamSink.addBatch(FileStreamSink.scala:131) 位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch $ 5 $ anonfun $ apply $ 17.apply(MicroBatchExecution.scala:549)处 在org.apache.spark.sql.execution.SQLExecution $$ anonfun $ withCustomExecutionEnv $ 1.apply(SQLExecution.scala:89) 在org.apache.spark.sql.execution.SQLExecution $ .withSQLConfPropagated(SQLExecution.scala:175) 在org.apache.spark.sql.execution.SQLExecution $ .withCustomExecutionEnv(SQLExecution.scala:84) 在org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:126) 在org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch $ 5.apply(MicroBatchExecution.scala:547)处 在org.apache.spark.sql.execution.streaming.ProgressReporter $ class.reportTimeTaken(ProgressReporter.scala:379) 在org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60) 在org.apache.spark.sql.execution.streaming.MicroBatchExecution.org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch(MicroBatchExecution.scala:546) 位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply $ mcV $ sp(MicroBatchExecution.scala:204) 位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply(MicroBatchExecution.scala:172) 位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply(MicroBatchExecution.scala:172) 在org.apache.spark.sql.execution.streaming.ProgressReporter $ class.reportTimeTaken(ProgressReporter.scala:379) 在org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60) 在org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1.apply $ mcZ $ sp(MicroBatchExecution.scala:172) 在org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) 在org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:166) 在org.apache.spark.sql.execution.streaming.StreamExecution.org $ apache $ spark $ sql $ execution $ streaming $ StreamExecution $$ runStream(StreamExecution.scala:293)处 在org.apache.spark.sql.execution.streaming.StreamExecution $$ anon $ 1.run(StreamExecution.scala:203) 由以下原因引起:org.apache.spark.SparkException:由于阶段失败而导致作业中止:阶段239.0中的任务0失败4次,最近一次失败:阶段239.0中的任务0.3丢失(TID 458、10.139.64.5,执行者0):组织.apache.spark.SparkException:写入行时任务失败。 在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:285) 在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply(FileFormatWriter.scala:197) 在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply(FileFormatWriter.scala:196) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 在org.apache.spark.scheduler.Task.run(Task.scala:112) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:384) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) 在java.lang.Thread.run(Thread.java:748) 引起原因:org.apache.spark.SparkException:无法执行用户定义的函数($ anonfun $ 7 :(字符串)=> double) 在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.project_doConsume_0 $(未知来源) 在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.processNext(未知来源) 在org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 在org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:620) 在org.apache.spark.sql.execution.datasources.FileFormatWriter $ SingleDirectoryWriteTask.execute(FileFormatWriter.scala:380) 在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter.scala:269)处 在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter.scala:267)处 在org.apache.spark.util.Utils $ .tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1392) 在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:272) ...另外8个 引起原因:java.lang.IndexOutOfBoundsException:索引:0,大小:0 在java.util.ArrayList.rangeCheck(ArrayList.java:657) 在java.util.ArrayList.get(ArrayList.java:433) 在linece8d5970ee964032865eb55725903ecf40。$ read $$ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $$ iw $$ iw $ iw $$ anonfun $ 7.apply(command-624712114183209:117) 在linece8d5970ee964032865eb55725903ecf40。$ read $$ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $$ iw $$ iw $ ii $$ anonfun $ 7.apply(command-624712114183209:108) ...还有17个
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1747)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1735)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1734)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1734)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:962)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1970)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1918)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1906)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2141)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
... 20 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:112)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$7: (string) => double)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:380)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1392)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
... 8 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at linece8d5970ee964032865eb55725903ecf40.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$7.apply(command-624712114183209:117)
at linece8d5970ee964032865eb55725903ecf40.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$7.apply(command-624712114183209:108)