目标:要从搜索API获取数据框列中可用关键字的JSON响应。
+---------+--------+--------------------+------+
|searchKeyword |Response |
+---------+--------+--------------------+------+
| bags | [{"id":"4664"}..... |
| sheet | [{"id":"976"}..... |
| bottles | [{"id":"1234"}..... |
| disposable bags| [{"id":"234"}..... |
+---------+--------+--------------------+------+
我获取了一些关键字的列表,然后将其转到了数据框中。之后,我通过执行mappartitions对这些关键字进行API调用,以便每个分区只能创建一个http连接。
但是,当我在rdd上执行操作时,却显示“连接池关闭错误”。
以下是使用mappartions的代码:-
val solrUrl = "http://%s:XXXXX/solr/%s/select?q=%s&fl=id,score&defType=edismax&wt=json"
def getHttpClient(): CloseableHttpClient = {
val httpClient: CloseableHttpClient = HttpClients.createDefault();
httpClient
}
def getResults(url:String, httpClient:org.apache.http.impl.client.CloseableHttpClient): String = {
val httpResponse = httpClient.execute(new HttpGet(url))
val entity = httpResponse.getEntity()
println(entity)
var content = ""
if (entity != null) {
val inputStream = entity.getContent()
content = scala.io.Source.fromInputStream(inputStream).getLines.mkString
inputStream.close
}
httpClient.getConnectionManager().shutdown()
return content
}
val rddResults = searchTermsDf.rdd.mapPartitions(partition => {
val connection = getHttpClient()
val newPartition = partition.map(keyword => {
val searchTerm = keyword.getString(0)
var url = solrUrl.format(HOST_IP,searchTerm)
getResults(url,connection)
}).toList // consumes the iterator, thus calls readMatchingFromDB
//println(newPartition)
connection.close()
newPartition.iterator // create a new iterator
})
rddResults.foreach(println)
如果我做错了事,请您帮我。