我试图通过spark-shell(Spark版本1.5.2)通过spark数据帧读取弹性搜索索引。我不明白什么是scoll-id或我需要做些什么才能从spark查询弹性搜索。
spark-shell --jars /transfer/hdp/lib/elasticsearch-spark_2.10-2.3.2.jar
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._
import org.apache.spark.SparkConf
import sqlContext._
import sqlContext.implicits._
// Stop current spark context to over-ride it
sc.stop()
// Create new spark config for Elastic Search
val config = new SparkConf()
config.set("es.nodes", "*elastic-search-host-name*")
config.set("es.resource", "spark_count/spark_count")
config.set("spark.serializer","org.apache.spark.serializer.KryoSerializer")
// Start new spark context
val sc = new SparkContext(config)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// Create dataframe for reading
val sparkDF = sqlContext.esDF("spark_count/spark_count")
// Print Schema Note this works
sparkDF.printSchema()
root
|-- color: string (nullable = true)
|-- event_time: timestamp (nullable = true)
|-- event_type: string (nullable = true)
|-- new_column: string (nullable = true)
|-- spark_count: string (nullable = true)
|-- train: string (nullable = true)
// Display 20 records
sparkDF.show()
[Stage 0:> (0 + 0) / 5]16/06/20 13:30:56 ERROR TaskContextImpl: Error in TaskCompletionListener
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: ActionRequestValidationException[Validation Failed: 1: no scroll ids specified;]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:449)
at org.elasticsearch.hadoop.rest.RestClient.deleteScroll(RestClient.java:512)
at org.elasticsearch.hadoop.rest.ScrollQuery.close(ScrollQuery.java:70)
...
答案 0 :(得分:0)
我只是想通了。
在开发过程中,我运行旧版弹性搜索1.0,虽然我可以将数据保存到弹性搜索并显示索引模式,但查询失败。
我尝试使用Elastic Search 1.4版,但它确实有效。