我们的elasticsearch在负载均衡器后面运行。负载平衡器的URL为https://es.mycomp.com。我可以从邮递员和curl处向其发布文件。因此,防火墙已为我的开发箱打开。但是当我从Spark发布文档时,我一直在获取:
19/06/19 09:35:49 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
19/06/19 09:35:49 INFO HttpMethodDirector: Retrying request
19/06/19 09:35:49 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
19/06/19 09:35:49 INFO HttpMethodDirector: Retrying request
19/06/19 09:35:50 ERROR NetworkClient: Node [10.127.30.46:433] failed (Connection refused: connect); no other nodes left - aborting...
19/06/19 09:35:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:196)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.127.30.46:433]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:184)
... 10 more
这是我的代码:
val conf = new SparkConf().setMaster("local[2]").setAppName("ESPost")
.set("es.index.auto.create", "true")
.set("es.nodes", "es.mycomp.com")
.set("es.port", "443")
.set("es.nodes.client.only", "true")
.set("es.http.timeout", "5m")
.set("es.scroll.size", "50")
我们正在使用elasticsearch2.4,这是我的配置:
scalaVersion := "2.11.12"
val sparkVersion = "1.3.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.elasticsearch" %% "elasticsearch-spark" % "2.4.5",
"org.apache.spark" % "spark-streaming_2.11" % sparkVersion
)