如何用Spark写入远程Elastic Search节点?

时间:2019-06-11 15:59:52

标签: apache-spark hadoop elasticsearch

我有一个代码:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
val sc = new SparkContext(conf)
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "1.2.3.4")
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

但是当我运行它时,它会尝试转到Localhost:

sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
19/06/11 11:56:16 ERROR rest.NetworkClient: Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
19/06/11 11:56:16 ERROR rest.NetworkClient: Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
19/06/11 11:56:16 ERROR executor.Executor: Exception in task 2.0 in stage 2.0 (TID 18)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

如何设置写入远程ES服务器?

1 个答案:

答案 0 :(得分:0)

请。参见configuration

es.nodes.discovery(默认为true) 是发现es集群中的节点,还是仅使用es.nodes中给出的节点进行元数据查询。请注意,此设置仅在启动期间适用;之后,在读写时,除非启用es.nodes.client.only,否则es会使用目标索引分片(及其托管节点)。

es.nodes.discovery设置为假
示例:

EsSpark.saveToEs(userTweetRDD, "twitter/test", Map("es.nodes" -> "xx.xx.xx.xxx", "es.cluster.name" -> xxxx-xxxxx"))

添加

"es.nodes.discovery" -> "false"

就您而言

您的示例:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._


val conf:SparkConf = new SparkConf().setAppName("MYESAPP")
.setMaster("local")// for "local" for local testing if you are using yarn then "yarn"

conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "1.2.3.4")
conf.set("es.nodes.discovery", "false")



val sc = new SparkContext(conf)

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")