无法使用天蓝色的数据块将事件中心的流数据写入blob

时间:2018-09-13 06:33:56

标签: spark-streaming databricks

我试图将事件中心的流数据写入Azure Databricks的blob中。一段时间后失败了。下面是错误消息。你能告诉原因吗?

val query =
  streamingDataFrame
    .writeStream
    .format("json")
    .option("path", "/mnt/blobimages/DatabricksSentimentPowerBI")
    .option("checkpointLocation", "/mnt/blobimages/sample/check2")    
    .trigger(ProcessingTime("20 seconds"))
    .start()
在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .write(FileFormatWriter.scala:224)中的

        在org.apache.spark.sql.execution.streaming.FileStreamSink.addBatch(FileStreamSink.scala:131)         位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch $ 5 $ anonfun $ apply $ 17.apply(MicroBatchExecution.scala:549)处         在org.apache.spark.sql.execution.SQLExecution $$ anonfun $ withCustomExecutionEnv $ 1.apply(SQLExecution.scala:89)         在org.apache.spark.sql.execution.SQLExecution $ .withSQLConfPropagated(SQLExecution.scala:175)         在org.apache.spark.sql.execution.SQLExecution $ .withCustomExecutionEnv(SQLExecution.scala:84)         在org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:126)         在org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch $ 5.apply(MicroBatchExecution.scala:547)处         在org.apache.spark.sql.execution.streaming.ProgressReporter $ class.reportTimeTaken(ProgressReporter.scala:379)         在org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60)         在org.apache.spark.sql.execution.streaming.MicroBatchExecution.org $ apache $ spark $ sql $ execution $ streaming $ MicroBatchExecution $$ runBatch(MicroBatchExecution.scala:546)         位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply $ mcV $ sp(MicroBatchExecution.scala:204)         位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply(MicroBatchExecution.scala:172)         位于org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1 $$ anonfun $ apply $ mcZ $ sp $ 1.apply(MicroBatchExecution.scala:172)         在org.apache.spark.sql.execution.streaming.ProgressReporter $ class.reportTimeTaken(ProgressReporter.scala:379)         在org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:60)         在org.apache.spark.sql.execution.streaming.MicroBatchExecution $$ anonfun $ runActivatedStream $ 1.apply $ mcZ $ sp(MicroBatchExecution.scala:172)         在org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)         在org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:166)         在org.apache.spark.sql.execution.streaming.StreamExecution.org $ apache $ spark $ sql $ execution $ streaming $ StreamExecution $$ runStream(StreamExecution.scala:293)处         在org.apache.spark.sql.execution.streaming.StreamExecution $$ anon $ 1.run(StreamExecution.scala:203)     由以下原因引起:org.apache.spark.SparkException:由于阶段失败而导致作业中止:阶段239.0中的任务0失败4次,最近一次失败:阶段239.0中的任务0.3丢失(TID 458、10.139.64.5,执行者0):组织.apache.spark.SparkException:写入行时任务失败。         在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:285)         在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply(FileFormatWriter.scala:197)         在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply(FileFormatWriter.scala:196)         在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)         在org.apache.spark.scheduler.Task.run(Task.scala:112)         在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:384)         在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)         在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)         在java.lang.Thread.run(Thread.java:748)     引起原因:org.apache.spark.SparkException:无法执行用户定义的函数($ anonfun $ 7 :(字符串)=> double)         在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.project_doConsume_0 $(未知来源)         在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage1.processNext(未知来源)         在org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)         在org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:620)         在org.apache.spark.sql.execution.datasources.FileFormatWriter $ SingleDirectoryWriteTask.execute(FileFormatWriter.scala:380)         在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter.scala:269)处         在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter.scala:267)处         在org.apache.spark.util.Utils $ .tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1392)         在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:272)         ...另外8个     引起原因:java.lang.IndexOutOfBoundsException:索引:0,大小:0         在java.util.ArrayList.rangeCheck(ArrayList.java:657)         在java.util.ArrayList.get(ArrayList.java:433)         在linece8d5970ee964032865eb55725903ecf40。$ read $$ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $$ iw $$ iw $ iw $$ anonfun $ 7.apply(command-624712114183209:117)         在linece8d5970ee964032865eb55725903ecf40。$ read $$ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $ iw $$ iw $$ iw $ ii $$ anonfun $ 7.apply(command-624712114183209:108)         ...还有17个

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1747)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1735)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1734)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1734)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:962)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1970)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1918)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1906)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2141)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194)
    ... 20 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:112)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$7: (string) => double)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:380)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1392)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
    ... 8 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:657)
    at java.util.ArrayList.get(ArrayList.java:433)
    at linece8d5970ee964032865eb55725903ecf40.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$7.apply(command-624712114183209:117)
    at linece8d5970ee964032865eb55725903ecf40.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$7.apply(command-624712114183209:108)

0 个答案:

没有答案