我能够使用下面的代码使用Phoenix JDBC driver和JdbcRDD将数据从Hbase导入Spark,如何使用SparkSQL将JdbcRDD转换为SchemaRDD?
//SparkToJDBC.scala
import java.sql.DriverManager
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Date;
import org.apache.spark.SparkContext
import org.apache.spark.rdd.JdbcRDD
object SparkToJDBC {
def main(args: Array[String]) {
val sc = new SparkContext("local", "phoenix")
try{
val rdd = new JdbcRDD(sc,() => {
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver").newInstance()
DriverManager.getConnection("jdbc:phoenix:localhost", "", "")
},
"SELECT id, name FROM test.orders WHERE id >= ? AND id <= ?",
1, 100, 3,
(r:ResultSet) => {
processResultSet(r)
}
).cache()
println(rdd.count());
} catch {
case _: Throwable => println("Could not connect to database")
}
sc.stop()
}
def processResultSet(rs: ResultSet){//TODO: return Row object as per JDBCRDD doc
val rsmd = rs.getMetaData()
val numberOfColumns = rsmd.getColumnCount()
var i = 1
while (i <= numberOfColumns) {
val s = rs.getString(i)
System.out.print(s + " ")
i += 1
}
println("")
}
}
答案 0 :(得分:1)
由于JdbcRdd是RDD的一个实现,如果你想创建一个SchemaRDD,你只需要创建一个SQLContext并调用几个方法:
val sc = new SparkContext("local", "phoenix")
val sqc = new SQLContext(sc)
try{
val rdd = new JdbcRDD(sc,() => {
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver").newInstance()
DriverManager.getConnection("jdbc:phoenix:localhost", "", "")
},
"SELECT id, name FROM test.orders WHERE id >= ? AND id <= ?",
1, 100, 3,
(r:ResultSet) => {
processResultSet(r)
}
).cache()
val schemaRDD = sqc.createSchemaRDD(rdd)
}
使用此代码,schemaRDD将是SchemaRDD [Unit],因为您没有在processResultSet方法中执行任何操作。所以这段代码会崩溃,因为SchemaRDD需要一个必须扩展Product的T类型,所以如果你不处理你的ResulSet就不能有SchemaRDD
我希望这对你有用。