如何正确保存spark rdd结果到mysql数据库

时间:2017-01-20 12:01:00

标签: scala apache-spark

我目前按照以下步骤将spark RDD结果保存到mysql数据库中。

User* user = ... // Get unmanaged User, 
                 // parsed from API 
                 // with unmanaged parsed nested mathces and round

RLMRealm* realm = [RLMRealm defaultRealm];
[realm beginWriteTransaction];
[realm addOrUpdateObject: user];
[realm commitWriteTransaction];

有更好的方法吗?

我尝试如下,但与第一种方法相比,它是如此缓慢:

import anorm._
import java.sql.Connection
import org.apache.spark.rdd.RDD

val wordCounts: RDD[(String, Int)] = ...

def getDbConnection(dbUrl: String): Connection = {
  Class.forName("com.mysql.jdbc.Driver").newInstance()
  java.sql.DriverManager.getConnection(dbUrl)
}

def using[X <: {def close()}, A](resource : X)(f : X => A): A =
  try { f(resource)
  } finally { resource.close() }

wordCounts.map.foreachPartition(iter => {
  using(getDbConnection(dbUrl)) { implicit conn =>
    iter.foreach { case (word, count) =>
      SQL"insert into WordCount VALUES(word, count)".executeUpdate()
    }
  }
})

0 个答案:

没有答案