我正在尝试在本地运行spark job来读取mysql表内容(在本地机器中)到jdbcRDD。从在线我收集了以下源代码并自定义为读取元素表并加载所有列。
private static final JavaSparkContext sc = new JavaSparkContext(
new SparkConf().setAppName("SparkJdbc").setMaster("local[*]"));
private static final String MYSQL_DRIVER = "com.mysql.jdbc.Driver";
private static final String MYSQL_CONNECTION_URL = "jdbc:mysql://localhost:3306/ddm";
private static final String MYSQL_USERNAME = "root";
private static final String MYSQL_PWD = "root";
public static void main(String[] args) {
DbConnection dbConnection = new DbConnection(MYSQL_DRIVER,
MYSQL_CONNECTION_URL, MYSQL_USERNAME, MYSQL_PWD);
// Load data from MySQL
JdbcRDD<Object[]> jdbcRDD = new JdbcRDD<>(sc.sc(), dbConnection,
"select * from element where elementid >= ? and elementid <= ?",
1000, 1100, 10, new MapResult(),
ClassManifestFactory$.MODULE$.fromClass(Object[].class));
// Convert to JavaRDD
JavaRDD<Object[]> javaRDD = JavaRDD.fromRDD(jdbcRDD,
ClassManifestFactory$.MODULE$.fromClass(Object[].class));
// Join first name and last name
List<String> employeeFullNameList = javaRDD.map(
new Function<Object[], String>() {
private static final long serialVersionUID = 1L;
@Override
public String call(final Object[] record) throws Exception {
return record[2] + " " + record[3];
}
}).collect();
for (String fullName : employeeFullNameList) {
System.out.println(fullName);
}
}
static class DbConnection extends AbstractFunction0<Connection> implements
Serializable {
private static final long serialVersionUID = 1L;
private String driverClassName;
private String connectionUrl;
private String userName;
private String password;
public DbConnection(String driverClassName, String connectionUrl,
String userName, String password) {
this.driverClassName = driverClassName;
this.connectionUrl = connectionUrl;
this.userName = userName;
this.password = password;
}
@Override
public Connection apply() {
try {
Class.forName(driverClassName);
} catch (ClassNotFoundException e) {
}
Properties properties = new Properties();
properties.setProperty("user", userName);
properties.setProperty("password", password);
Connection connection = null;
try {
connection = DriverManager.getConnection(connectionUrl,
properties);
} catch (SQLException e) {
}
return connection;
}
}
static class MapResult extends AbstractFunction1<ResultSet, Object[]>
implements Serializable {
private static final long serialVersionUID = 1L;
public Object[] apply(ResultSet row) {
return JdbcRDD.resultSetToObjectArray(row);
}
}
然而,当我执行代码时,我得到NullPointerException。
15/08/01 08:27:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)java.lang.NullPointerException
at org.apache.spark.rdd.JdbcRDD$$anon$1.<init>(JdbcRDD.scala:79)
at org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:74)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
我在github中查找了jdbcrdd.scala代码,在第79行查找了SQL stmt。
val stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
所以,上述陈述失败了。我已经提供了所需的详细信息,但它抛出了null异常。任何人都可以帮助我在哪里做错了吗?
答案 0 :(得分:0)
我忽略了我的导入声明。添加下面的代码后,我可以在本地执行spark程序。
import java.sql.{PreparedStatement, Connection, ResultSet}