Question

我试图将事件中心的流数据写入Azure Databricks的blob中。一段时间后失败了。下面是错误消息。你能告诉原因吗？

val query =
  streamingDataFrame
    .writeStream
    .format("json")
    .option("path", "/mnt/blobimages/DatabricksSentimentPowerBI")
    .option("checkpointLocation", "/mnt/blobimages/sample/check2")    
    .trigger(ProcessingTime("20 seconds"))
    .start()

在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .write（FileFormatWriter.scala：224）中的

在org.apache.spark.sql.execution.streaming.FileStreamSink.addBatch（FileStreamSink.scala：131）位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch $ 5 $ anonfun $ apply $ 17.apply（MicroBatchExecution.scala：549）处在org.apache.spark.sql.execution.SQLExecution $$ anonfun $ withCustomExecutionEnv $ 1.apply（SQLExecution.scala：89）在org.apache.spark.sql.execution.SQLExecution $ .withSQLConfPropagated（SQLExecution.scala：175）在org.apache.spark.sql.execution.SQLExecution $ .withCustomExecutionEnv（SQLExecution.scala：84）在org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId（SQLExecution.scala：126）在org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch $ 5.apply（MicroBatchExecution.scala：547）处在org.apache.spark.sql.execution.streaming.ProgressReporter $ class.reportTimeTaken（ProgressReporter.scala：379）在org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken（StreamExecution.scala：60）在org.apache.spark.sql.execution.streaming.MicroBatchExecution.org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch（MicroBatchExecution.scala：546）位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply $ mcV $ sp（MicroBatchExecution.scala：204）位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply（MicroBatchExecution.scala：172）位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply（MicroBatchExecution.scala：172）在org.apache.spark.sql.execution.streaming.ProgressReporter $ class.reportTimeTaken（ProgressReporter.scala：379）在org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken（StreamExecution.scala：60）在org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1.apply $ mcZ $ sp（MicroBatchExecution.scala：172）在org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute（TriggerExecutor.scala：56）在org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream（MicroBatchExecution.scala：166）在org.apache.spark.sql.execution.streaming.StreamExecution.org $ apache $ spark $ sql $ execution $ streaming $ StreamExecution $$ runStream（StreamExecution.scala：293）处在org.apache.spark.sql.execution.streaming.StreamExecution $$ anon $ 1.run（StreamExecution.scala：203）由以下原因引起：org.apache.spark.SparkException：由于阶段失败而导致作业中止：阶段239.0中的任务0失败4次，最近一次失败：阶段239.0中的任务0.3丢失（TID 458、10.139.64.5，执行者0）：组织.apache.spark.SparkException：写入行时任务失败。在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask（FileFormatWriter.scala：285）在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply（FileFormatWriter.scala：197）在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply（FileFormatWriter.scala：196）在org.apache.spark.scheduler.ResultTask.runTask（ResultTask.scala：87）在org.apache.spark.scheduler.Task.run（Task.scala：112）在org.apache.spark.executor.Executor $ TaskRunner.run（Executor.scala：384）在java.util.concurrent.ThreadPoolExecutor.runWorker（ThreadPoolExecutor.java:1149）在java.util.concurrent.ThreadPoolExecutor $ Worker.run（ThreadPoolExecutor.java:624）在java.lang.Thread.run（Thread.java:748）引起原因：org.apache.spark.SparkException：无法执行用户定义的函数（$ anonfun $ 7 ：（字符串）=> double）在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.project_doConsume_0 $（未知来源）在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.processNext（未知来源）在org.apache.spark.sql.execution.BufferedRowIterator.hasNext（BufferedRowIterator.java:43）在org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext（WholeStageCodegenExec.scala：620）在org.apache.spark.sql.execution.datasources.FileFormatWriter $ SingleDirectoryWriteTask.execute（FileFormatWriter.scala：380）在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply（FileFormatWriter.scala：269）处在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply（FileFormatWriter.scala：267）处在org.apache.spark.util.Utils $ .tryWithSafeFinallyAndFailureCallbacks（Utils.scala：1392）在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask（FileFormatWriter.scala：272） ...另外8个引起原因：java.lang.IndexOutOfBoundsException：索引：0，大小：0 在java.util.ArrayList.rangeCheck（ArrayList.java:657）在java.util.ArrayList.get（ArrayList.java:433）在linece8d5970ee964032865eb55725903ecf40。$ read $$ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $$ iw $$ iw $ iw $$ anonfun $ 7.apply（command-624712114183209：117）在linece8d5970ee964032865eb55725903ecf40。$ read $$ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $$ iw $$ iw $ ii $$ anonfun $ 7.apply（command-624712114183209：108） ...还有17个

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1747)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1735)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1734)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1734)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:962)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1970)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1918)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1906)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2141)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
    ... 20 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:112)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$7: (string) => double)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:380)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1392)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
    ... 8 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:657)
    at java.util.ArrayList.get(ArrayList.java:433)
    at linece8d5970ee964032865eb55725903ecf40.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$7.apply(command-624712114183209:117)
    at linece8d5970ee964032865eb55725903ecf40.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$7.apply(command-624712114183209:108)

无法使用天蓝色的数据块将事件中心的流数据写入blob

0 个答案: