用于指定spark-shell查询文件的语法(使用elasticsearch-spark连接器)

时间:2017-02-04 12:50:08

标签: apache-spark elasticsearch

对于spark-shell命令,我想为spark.es.query参数指定一个文件:

] $SPARK_HOME/bin/spark-shell --master local[4] 
    --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar 
    --conf spark.es.nodes="localhost" --conf spark.es.resource="myindex/mytype" 
    --conf spark.es.query="/home/pat/spark/myquery.json"

在shell中:

scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype")
scala> es_rdd.first()

输出我得到:

17/02/04 07:41:31 ERROR TaskContextImpl: Error in TaskCompletionListener
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine 
specified query - doesn't appear to be URI or JSON based and location 
[/home/pat/spark/myquery.json] cannot be opened

当然,文件存在于路径上。这是指定查询文件的好方法吗?

1 个答案:

答案 0 :(得分:1)

您收到此错误,因为spark和es-connector期望文件路径作为URI传递:

Tools::redirect