Apache Spark Mysql连接找不到合适的jdbc驱动程序

时间:2015-08-28 21:50:33

标签: mysql jdbc apache-spark

我正在使用Apache Spark来分析查询日志。我已经遇到了设置火花的一些困难。现在我使用独立群集来处理查询。

首先,我在java中使用示例代码来计算运行良好的单词。但是当我尝试将它连接到MySQL服务器时出现问题。我使用的是ubuntu 14.04 LTS,64位。 Spark版本1.4.1,Mysql 5.1。

这是我的代码,当我使用Master Url而不是[Local *]时,我收到错误找不到合适的驱动程序。我已经包含了日志。

import java.io.Serializable;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;

public class LoadFromDb implements Serializable {

    private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(LoadFromDb.class);

    private static final String MYSQL_DRIVER = "com.mysql.jdbc.Driver";
    private static final String MYSQL_USERNAME = "spark";
    private static final String MYSQL_PWD = "spark123";
    private static final String MYSQL_CONNECTION_URL =
            "jdbc:mysql://localhost/productsearch_userinfo?user=" + MYSQL_USERNAME + "&password=" + MYSQL_PWD;

    private static final JavaSparkContext sc =
            new JavaSparkContext(new SparkConf().setAppName("SparkJdbcDs").setMaster("spark://shawon-H67MA-USB3-B3:7077"));

    private static final SQLContext sqlContext = new SQLContext(sc);

    public static void main(String[] args) {
        //Data source options
        Map<String, String> options = new HashMap<>();
        options.put("driver", MYSQL_DRIVER);
        options.put("url", MYSQL_CONNECTION_URL);
        options.put("dbtable",
                    "query");
        //options.put("partitionColumn", "sessionID");
       // options.put("lowerBound", "10001");
        //options.put("upperBound", "499999");
        //options.put("numPartitions", "10");

        //Load MySQL query result as DataFrame
        DataFrame jdbcDF = sqlContext.load("jdbc", options);

        //jdbcDF.show();
        jdbcDF.select("id","queryText").show();




    }
}

任何示例项目都会提供很多帮助。日志是: 使用Spark的默认log4j配置文件:org / apache / spark / log4j-defaults.properties

15/08/29 03:38:26 INFO SparkContext: Running Spark version 1.4.1
15/08/29 03:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/29 03:38:27 WARN Utils: Your hostname, shawon-H67MA-USB3-B3 resolves to a loopback address: 127.0.0.1; using 192.168.1.102 instead (on interface eth0)
15/08/29 03:38:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/08/29 03:38:27 INFO SecurityManager: Changing view acls to: shawon
15/08/29 03:38:27 INFO SecurityManager: Changing modify acls to: shawon
15/08/29 03:38:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shawon); users with modify permissions: Set(shawon)
15/08/29 03:38:27 INFO Slf4jLogger: Slf4jLogger started
15/08/29 03:38:27 INFO Remoting: Starting remoting
15/08/29 03:38:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.102:60742]
15/08/29 03:38:27 INFO Utils: Successfully started service 'sparkDriver' on port 60742.
15/08/29 03:38:27 INFO SparkEnv: Registering MapOutputTracker
15/08/29 03:38:27 INFO SparkEnv: Registering BlockManagerMaster
15/08/29 03:38:27 INFO DiskBlockManager: Created local directory at /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/blockmgr-57acbba4-d7d4-4557-9e6c-e1acf97d4c88
15/08/29 03:38:27 INFO MemoryStore: MemoryStore started with capacity 473.3 MB
15/08/29 03:38:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/httpd-a5e6844d-ac3a-4da2-822c-1b98d0a287c4
15/08/29 03:38:27 INFO HttpServer: Starting HTTP Server
15/08/29 03:38:27 INFO Utils: Successfully started service 'HTTP file server' on port 55199.
15/08/29 03:38:27 INFO SparkEnv: Registering OutputCommitCoordinator
15/08/29 03:38:28 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/08/29 03:38:28 INFO SparkUI: Started SparkUI at http://192.168.1.102:4040
15/08/29 03:38:28 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@shawon-H67MA-USB3-B3:7077/user/Master...
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150829033828-0000
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor added: app-20150829033828-0000/0 on worker-20150829033238-192.168.1.102-36976 (192.168.1.102:36976) with 4 cores
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150829033828-0000/0 on hostPort 192.168.1.102:36976 with 4 cores, 512.0 MB RAM
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor updated: app-20150829033828-0000/0 is now RUNNING
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor updated: app-20150829033828-0000/0 is now LOADING
15/08/29 03:38:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58874.
15/08/29 03:38:28 INFO NettyBlockTransferService: Server created on 58874
15/08/29 03:38:28 INFO BlockManagerMaster: Trying to register BlockManager
15/08/29 03:38:28 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.102:58874 with 473.3 MB RAM, BlockManagerId(driver, 192.168.1.102, 58874)
15/08/29 03:38:28 INFO BlockManagerMaster: Registered BlockManager
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/08/29 03:38:30 INFO SparkContext: Starting job: show at LoadFromDb.java:43
15/08/29 03:38:30 INFO DAGScheduler: Got job 0 (show at LoadFromDb.java:43) with 1 output partitions (allowLocal=false)
15/08/29 03:38:30 INFO DAGScheduler: Final stage: ResultStage 0(show at LoadFromDb.java:43)
15/08/29 03:38:30 INFO DAGScheduler: Parents of final stage: List()
15/08/29 03:38:30 INFO DAGScheduler: Missing parents: List()
15/08/29 03:38:30 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at LoadFromDb.java:43), which has no missing parents
15/08/29 03:38:30 INFO MemoryStore: ensureFreeSpace(4304) called with curMem=0, maxMem=496301506
15/08/29 03:38:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.2 KB, free 473.3 MB)
15/08/29 03:38:30 INFO MemoryStore: ensureFreeSpace(2274) called with curMem=4304, maxMem=496301506
15/08/29 03:38:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.2 KB, free 473.3 MB)
15/08/29 03:38:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.102:58874 (size: 2.2 KB, free: 473.3 MB)
15/08/29 03:38:30 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/08/29 03:38:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at LoadFromDb.java:43)
15/08/29 03:38:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/08/29 03:38:30 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@192.168.1.102:56580/user/Executor#1344522225]) with ID 0
15/08/29 03:38:30 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:30 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.102:56904 with 265.4 MB RAM, BlockManagerId(0, 192.168.1.102, 56904)
15/08/29 03:38:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.102:56904 (size: 2.2 KB, free: 265.4 MB)
15/08/29 03:38:31 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.102): java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123
    at java.sql.DriverManager.getConnection(DriverManager.java:596)
    at java.sql.DriverManager.getConnection(DriverManager.java:187)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:185)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:177)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:359)
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 1]
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 2]
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 3]
15/08/29 03:38:31 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
15/08/29 03:38:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/08/29 03:38:31 INFO TaskSchedulerImpl: Cancelling stage 0
15/08/29 03:38:31 INFO DAGScheduler: ResultStage 0 (show at LoadFromDb.java:43) failed in 1.680 s
15/08/29 03:38:31 INFO DAGScheduler: Job 0 failed: show at LoadFromDb.java:43, took 1.840969 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.1.102): java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123
    at java.sql.DriverManager.getConnection(DriverManager.java:596)
    at java.sql.DriverManager.getConnection(DriverManager.java:187)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:185)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:177)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:359)
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/08/29 03:38:31 INFO SparkContext: Invoking stop() from shutdown hook
15/08/29 03:38:31 INFO SparkUI: Stopped Spark web UI at http://192.168.1.102:4040
15/08/29 03:38:31 INFO DAGScheduler: Stopping DAGScheduler
15/08/29 03:38:31 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/08/29 03:38:31 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
15/08/29 03:38:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/08/29 03:38:31 INFO Utils: path = /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/blockmgr-57acbba4-d7d4-4557-9e6c-e1acf97d4c88, already present as root for deletion.
15/08/29 03:38:31 INFO MemoryStore: MemoryStore cleared
15/08/29 03:38:31 INFO BlockManager: BlockManager stopped
15/08/29 03:38:32 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/29 03:38:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/08/29 03:38:32 INFO SparkContext: Successfully stopped SparkContext
15/08/29 03:38:32 INFO Utils: Shutdown hook called
15/08/29 03:38:32 INFO Utils: Deleting directory /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1
15/08/29 03:38:32 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

1 个答案:

答案 0 :(得分:0)

我遇到了与Oracle数据库连接相同的问题。

我做了4件事,但我认为前2件修好了:

  1. 运行应用程序时,将驱动程序jar添加到classpath:
  2. %YOUR_SPARK_HOME%/bin/spark-submit --jars file://c:/jdbcDrivers/ojdbc7.jar

    1. 在连接属性中添加驱动程序(在您的情况下为选项对象):

          dbProps.put("driver", "oracle.jdbc.driver.OracleDriver");
      
    2. 在%YOUR_SPARK_HOME%/ conf / spark-defaults.conf文件下添加:

      spark.driver.extraClassPath = file:// C:/jdbcDrivers/ojdbc7.jar

    3. 在同样的conf下还添加:

      spark.executor.extraClassPath = file:// C:/jdbcDrivers/ojdbc7.jar

    4. 我已经获得了与DB的连接,但目前正在处理一些问题以插入Oracle。