使用jdbcRDD运行本地Spark时出错

时间:2015-08-01 03:14:15

标签: mysql apache-spark local

我正在尝试在本地运行spark job来读取mysql表内容(在本地机器中)到jdbcRDD。从在线我收集了以下源代码并自定义为读取元素表并加载所有列。

private static final JavaSparkContext sc = new JavaSparkContext(
        new SparkConf().setAppName("SparkJdbc").setMaster("local[*]"));

private static final String MYSQL_DRIVER = "com.mysql.jdbc.Driver";
private static final String MYSQL_CONNECTION_URL = "jdbc:mysql://localhost:3306/ddm";
private static final String MYSQL_USERNAME = "root";
private static final String MYSQL_PWD = "root";

public static void main(String[] args) {
    DbConnection dbConnection = new DbConnection(MYSQL_DRIVER,
            MYSQL_CONNECTION_URL, MYSQL_USERNAME, MYSQL_PWD);

    // Load data from MySQL
    JdbcRDD<Object[]> jdbcRDD = new JdbcRDD<>(sc.sc(), dbConnection,
            "select * from element where elementid >= ? and elementid <= ?",
            1000, 1100, 10, new MapResult(),
            ClassManifestFactory$.MODULE$.fromClass(Object[].class));

    // Convert to JavaRDD
    JavaRDD<Object[]> javaRDD = JavaRDD.fromRDD(jdbcRDD,
            ClassManifestFactory$.MODULE$.fromClass(Object[].class));

    // Join first name and last name
    List<String> employeeFullNameList = javaRDD.map(
            new Function<Object[], String>() {

                private static final long serialVersionUID = 1L;

                @Override
                public String call(final Object[] record) throws Exception {
                    return record[2] + " " + record[3];
                }
            }).collect();

    for (String fullName : employeeFullNameList) {
        System.out.println(fullName);
    }
}

static class DbConnection extends AbstractFunction0<Connection> implements
        Serializable {

    private static final long serialVersionUID = 1L;
    private String driverClassName;
    private String connectionUrl;
    private String userName;
    private String password;

    public DbConnection(String driverClassName, String connectionUrl,
            String userName, String password) {
        this.driverClassName = driverClassName;
        this.connectionUrl = connectionUrl;
        this.userName = userName;
        this.password = password;
    }

    @Override
    public Connection apply() {
        try {
            Class.forName(driverClassName);
        } catch (ClassNotFoundException e) {
        }

        Properties properties = new Properties();
        properties.setProperty("user", userName);
        properties.setProperty("password", password);

        Connection connection = null;
        try {
            connection = DriverManager.getConnection(connectionUrl,
                    properties);
        } catch (SQLException e) {
        }

        return connection;
    }
}

static class MapResult extends AbstractFunction1<ResultSet, Object[]>
        implements Serializable {
    private static final long serialVersionUID = 1L;

    public Object[] apply(ResultSet row) {
        return JdbcRDD.resultSetToObjectArray(row);
    }
} 

然而,当我执行代码时,我得到NullPointerException。

15/08/01 08:27:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)java.lang.NullPointerException
at org.apache.spark.rdd.JdbcRDD$$anon$1.<init>(JdbcRDD.scala:79)
at org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:74)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

我在github中查找了jdbcrdd.scala代码,在第79行查找了SQL stmt。

 val stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)

所以,上述陈述失败了。我已经提供了所需的详细信息,但它抛出了null异常。任何人都可以帮助我在哪里做错了吗?

1 个答案:

答案 0 :(得分:0)

我忽略了我的导入声明。添加下面的代码后,我可以在本地执行spark程序。

import java.sql.{PreparedStatement, Connection, ResultSet}