Spark Dataframe的Elasticsearch-Hadoop连接器

时间:2017-07-17 20:41:14

标签: apache-spark-sql pyspark-sql elasticsearch-hadoop

我正在尝试将火花数据帧写入Elasticsearch,如下所示:

Py4JJavaError: An error occurred while calling o50.save.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0
in stage 3.0 (TID 8, localhost, executor driver): 
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data
nodes with HTTP-enabled available; node discovery is disabled and none
of nodes specified fit the criterion [XX.XXX.XX.X:XXXX]
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:152)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:549)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

不幸的是,我收到以下错误:

sudo PYSPARK_DRIVER_PYTHON=jupyter-notebook $SPARK_HOME/bin/pyspark --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.3.1 --conf spark.es.nodes="52.XXX.XX.XX" --conf spark.es.port="XXX" --conf spark.es.nodes.discovery=false --conf spark.es.net.http.auth.user="user" --conf spark.es.net.http.auth.pass="password"

我用:

  

spark = 2.1.0 scala = 2.11 elasticsearch = 2.4.5 Jupyter笔记本

以及以下命令:

Py4JJavaError: An error occurred while calling o50.save.: 
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 3.0
in stage 3.0 (TID 11, localhost, executor driver):
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection
error (check network and/or proxy settings)- all nodes failed; tried
[[127.0.0.1:9200]] 
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:469)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:537)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:543)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:412)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:580)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:568)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
 at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
 at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:99)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

使用spark.es.nodes.discovery = true时,我收到错误:

fromDraft(Draft draft) {
    Object.assign(this, draft);
    this.info = new Info();
    this.info.value = draft.summary;
}

有人可以帮忙吗?

0 个答案:

没有答案