我的场景:从elasticsearch读取数据然后做一些计算,计算的最终结果存储在elasticsearch中。
我使用少量数据来测试是否成功,但切换到大量数据总是会出现此错误。我真的很困惑
火花版:1.6.1 elasticsearch版本:2.3.1
线程中的异常" main" org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务1失败4次,最近失败:阶段0.0中失去的任务1.3(TID 37,10.10.150.231):org.elasticsearch.hadoop.rest .EsHadoopInvalidRequest:null c2NhbjsxOzMxMzY0OlpFSWVjWnh5Ukxtd1diMUdoVXJINVE7MTt0b3RhbF9oaXRzOjQ2NzIwOw == 在org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478) 在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436) 在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426) 在org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:496) 在org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:454) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86) at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43) 在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371) 在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327) 在org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284) 在org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) 在org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:268) 在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 在org.apache.spark.scheduler.Task.run(Task.scala:89) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:214) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)
驱动程序堆栈跟踪: 在org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1431) 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1419) 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1418) 在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59) 在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:799) 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:799) 在scala.Option.foreach(Option.scala:236) 在org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) 在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) 在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) 在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) 在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48) 在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) 在org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) 在org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) 在org.apache.spark.SparkContext.runJob(SparkContext.scala:1922) 在org.elasticsearch.spark.rdd.EsSpark $ .saveToEs(EsSpark.scala:67) 在org.elasticsearch.spark.rdd.EsSpark $ .saveToEs(EsSpark.scala:52) 在org.elasticsearch.spark.package $ SparkRDDFunctions.saveToEs(package.scala:37) 在BothwayForPU $ .main(BothwayForPU.scala:82) 在BothwayForPU.main(BothwayForPU.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) 在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:731) 在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) 在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206) 在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 引起:org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:null c2NhbjsxOzMxMzY0OlpFSWVjWnh5Ukxtd1diMUdoVXJINVE7MTt0b3RhbF9oaXRzOjQ2NzIwOw == 在org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478) 在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436) 在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426) 在org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:496) 在org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:454) at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86) at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43) 在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371) 在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327) 在org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284) 在org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) 在org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:268) 在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 在org.apache.spark.scheduler.Task.run(Task.scala:89) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:214) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)