spark从oracle导入数据 - java.lang.ClassNotFoundException:oracle.jdbc.driver.OracleDriver

时间:2017-05-01 03:36:35

标签: python oracle hadoop apache-spark pyspark

尝试使用AWS EMR上的spark从oracle数据库读取数据时, 我收到此错误消息:

  

java.lang.ClassNotFoundException:oracle.jdbc.driver.OracleDriver。

如果有人遇到此问题以及他们如何解决问题,有人可以告诉我吗?

pyspark --driver-class-path /home/hadoop/ojdbc7.jar --jars   /home/hadoop/ojdbc7.jar

from pyspark import SparkContext, HiveContext, SparkConf

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

df = sqlContext.read.format("jdbc").options(url="jdbc:oracle:thin:user/pass@//10.200.100.142:1521/BMD", driver = "oracle.jdbc.driver.OracleDriver", 
dbtable="S_0COORDER_TEXT_D").load()

3 个答案:

答案 0 :(得分:1)

虽然你还没有提到你正在使用哪种版本的火花......你可以尝试以下....

将罐子导入驱动器和放大器遗嘱执行人。因此,您需要编辑conf/spark-defaults.conf以下两行。

spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
spark.executor.extraClassPath /home/hadoop/ojdbc7.jar


 您可以尝试在提交作业时通过,如下例所示:

--conf spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
--conf spark.executor.extraClassPath /home/hadoop/ojdbc7.jar

答案 1 :(得分:1)

将以下代码添加到your_spark_home_path / conf / spark-defaults.conf,' / opt / modules / extraClass /'是dir,我把额外的罐子放在哪里:

spark.driver.extraClassPath = /opt/modules/extraClass/jodbc7.jar
spark.executor.extraClassPath = /opt/modules/extraClass/jodbc7.jar

或者你可以简单地将jodbc7.jar添加到your_spark_home_path / jars。

答案 2 :(得分:1)

我在AWS EMR集群(emr-5.31.0)上遇到了完全相同的问题。

spark.driver.extraClassPath spark.executor.extraClassPath SparkSession.builder.config()中设置spark-defaults.confspark-submit --jars jodbc6.jar的位置的spark.jars.packages命令一起无效。

我终于通过将Maven坐标传递到spark.driver.extraClassPath使它起作用,然后我 必须设置spark.executor.extraClassPath和{{1 }}到$HOME/.ivy2/jars/*

import os
from pyspark.sql import SparkSession

spark_packages_list = [
    'io.delta:delta-core_2.11:0.6.1',
    'com.oracle.database.jdbc:ojdbc6:11.2.0.4',
]
spark_packages = ",".join(spark_packages_list)

home = os.getenv("HOME")

spark = (
    SparkSession
    .builder
    .config("spark.jars.packages", spark_packages)
    .config('spark.driver.extraClassPath', f"{home}/.ivy2/jars/*")
    .config('spark.executor.extraClassPath', f"{home}/.ivy2/jars/*")
)

然后进行以下操作(相应地更改参数):

host = "111.111.111.111"
port = "1234"
schema = "YourSchema"
URL = f"jdbc:oracle:thin:@{host}:{port}/{schema}"

with open(f"{home}/username.file", "r") as f:
    username = f.read()

with open(f"{home}/password.file", "r") as f:
    password = f.read()

query  = "SELECT * FROM YourTable"

df = (spark.read.format("jdbc")
    .option("url", URL)
    .option("query", query)
    .option("user", username)
    .option("password", password)
    .load()
)

df.printSchema()
df.show()

OR

properties = {
    "user": username,
    "password": password,
}

df = spark.read.jdbc(
    url=URL, 
    table="YourTable",
    properties=properties,
    )

df.printSchema()
df.show()