如何使用scala在spark中表示弹性搜索dsl查询?

时间:2016-01-05 07:38:06

标签: scala elasticsearch apache-spark

如何用scala表示弹性搜索查询,如下所示:

请求

GET importsmethods/typeimportsmethods/_search?search_type=count
{
  "size": 0,
  "aggs": {
    "group_by_imports": {
      "terms": {
        "field": "tokens.importName"
      }
    }
  }

}

响应

{
   "took": 2064,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1297362,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "group_by_imports": {
         "doc_count_error_upper_bound": 4939,
         "sum_other_doc_count": 1960640,
         "buckets": [
            {
               "key": "java.util.list",
               "doc_count": 129986
            },
            {
               "key": "java.util.map",
               "doc_count": 103525
            }
         ]
      }
   }
}

Spark Code

val conf = new SparkConf().setMaster("local[2]").setAppName("test")

conf.set("es.nodes", "localhost")
conf.set("es.port", "9200")
conf.set("es.index.auto.create","true")
conf.set("es.resource","importsmethods/typeimportsmethods/_search")
conf.set("es.query","""?search_type=count&ignore_unavailable=true {
  "size": 0,
     "aggs": {
       "group_by_imports": {
         "terms": {
           "field": "tokens.importName"
         }
       }
     }
}""")

sc = new SparkContext(conf)
val importMethodsRDD = sc.esRDD();
val rddVal = importMethodsRDD.map(x => x._2) 

rddVal.saveAsTextFile("../")

异常

  

线程“main”中的异常   org.elasticsearch.hadoop.EsHadoopIllegalArgumentException:索引   [importsmethods / typeimportsmethods / _search]缺失和设置   [es.field.read.empty.as.null]设置为false

1 个答案:

答案 0 :(得分:0)

您只需要修复以下行,es.resource应该只有index/type无需添加_search端点

conf.set("es.resource","importsmethods/typeimportsmethods")

此外,在es.query中,您不需要查询字符串,只需要查询DSL部分:

conf.set("es.query","""{
  "size": 0,
     "aggs": {
       "group_by_imports": {
         "terms": {
           "field": "tokens.importName"
         }
       }
     }
}""")