如何在spark中使用连接池到postgresql

时间:2016-01-05 17:45:18

标签: postgresql scala apache-spark

我有一个spark(1.2.1 v)作业,使用postgresql.Driver for scala将rdd的内容插入postgres:

rdd.foreachPartition(iter => {

        //connect to postgres database on the localhost
        val driver = "org.postgresql.Driver"
        var connection:Connection = null
        Class.forName(driver)
        connection = DriverManager.getConnection(url, username, password)
        val statement = connection.createStatement()

        iter.foreach(row => {
            val mapRequest = Utils.getInsertMap(row)
            val query = Utils.getInsertRequest(squares_table, mapRequest)

            try { statement.execute(query) } 
            catch {
                case pe: PSQLException => println("exception caught: " + pe);
            }
        })
        connection.close()
})

在上面的代码中,我为rdd的每个分区打开了与postgres的新连接并关闭它。我认为正确的方法是使用连接池来postgres,我可以从中获取连接(如here所述),但它只是伪代码:

rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection)  // return to the pool for future reuse
}

使用spark连接池连接到postgres的正确方法是什么?

1 个答案:

答案 0 :(得分:0)

此代码将适用于spark 2或磨碎机版本和scala,首先您必须添加spark jdbc驱动程序。

如果您正在使用Maven,则可以按照这种方式工作。将此设置添加到您的pom文件

    <dependency>
        <groupId>postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>9.1-901-1.jdbc4</version>
    </dependency>

将此代码写入scala文件

import org.apache.spark.sql.SparkSession

object PostgresConnection {
  def main(args: Array[String]) {
    val spark =
        SparkSession.builder()
        .appName("DataFrame-Basic")
        .master("local[4]")
        .getOrCreate()

   val prop = new java.util.Properties
   prop.setProperty("driver","org.postgresql.Driver")
   prop.setProperty("user", "username")
   prop.setProperty("password", "password")
  val url = "jdbc:postgresql://127.0.0.1:5432/databaseName"
  val df = spark.read.jdbc(url, "table_name",prop)
  println(df.show(5))
 }
}