始终得到" EsHadoopInvalidRequest:null"当使用spark& elasticsearch-hadoop的

时间:2016-06-24 08:42:12

标签: elasticsearch apache-spark

我的场景:从elasticsearch读取数据然后做一些计算,计算的最终结果存储在elasticsearch中。

我使用少量数据来测试是否成功,但切换到大量数据总是会出现此错误。我真的很困惑

火花版:1.6.1 elasticsearch版本:2.3.1

线程中的异常" main" org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务1失败4次,最近失败:阶段0.0中失去的任务1.3(TID 37,10.10.150.231):org.elasticsearch.hadoop.rest .EsHadoopInvalidRequest:null c2NhbjsxOzMxMzY0OlpFSWVjWnh5Ukxtd1diMUdoVXJINVE7MTt0b3RhbF9oaXRzOjQ2NzIwOw ==     在org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478)     在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)     在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426)     在org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:496)     在org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:454)     at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)     at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)     在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)     在org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284)     在org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)     在org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:268)     在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:270)     在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)     在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)     在org.apache.spark.scheduler.Task.run(Task.scala:89)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:214)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

驱动程序堆栈跟踪:     在org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1431)     在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1419)     在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1418)     在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)     在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)     在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)     在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:799)     在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:799)     在scala.Option.foreach(Option.scala:236)     在org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)     在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)     在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)     在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)     在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48)     在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)     在org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)     在org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)     在org.apache.spark.SparkContext.runJob(SparkContext.scala:1922)     在org.elasticsearch.spark.rdd.EsSpark $ .saveToEs(EsSpark.scala:67)     在org.elasticsearch.spark.rdd.EsSpark $ .saveToEs(EsSpark.scala:52)     在org.elasticsearch.spark.package $ SparkRDDFunctions.saveToEs(package.scala:37)     在BothwayForPU $ .main(BothwayForPU.scala:82)     在BothwayForPU.main(BothwayForPU.scala)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:606)     在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:731)     在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181)     在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206)     在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121)     在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 引起:org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:null c2NhbjsxOzMxMzY0OlpFSWVjWnh5Ukxtd1diMUdoVXJINVE7MTt0b3RhbF9oaXRzOjQ2NzIwOw ==     在org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478)     在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)     在org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426)     在org.elasticsearch.hadoop.rest.RestClient.scroll(RestClient.java:496)     在org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:454)     at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)     at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)     在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)     在org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284)     在org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)     在org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:268)     在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:270)     在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)     在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)     在org.apache.spark.scheduler.Task.run(Task.scala:89)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:214)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

0 个答案:

没有答案