如何在客户端模式下使用带独立火花的pyspark加载--jars

时间:2017-08-27 14:37:44

标签: python mysql apache-spark jdbc pyspark

我在客户端模式下使用带有spark独立集群的python 2.7。

我想使用jdbc for mysql,发现我需要使用--jars参数加载它,我在我的本地有jdbc,并设法使用像here这样的pyspark控制台加载它

当我在ide中编写python脚本时,使用pyspark,我无法加载额外的jar mysql-connector-java-5.1.26.jar并继续

  

没有合适的驱动程序

错误

如何在客户端模式下运行python脚本,在客户端模式下使用独立群集并引用远程主服务器时加载其他jar文件?

编辑:添加了一些代码########################################## ############################### 这是我正在使用的基本代码,我在python中使用带有spark上下文的pyspark,例如我不直接使用spark submit并且不了解如何在这种情况下使用spark submit参数...

def createSparkContext(masterAdress = algoMaster):
    """
    :return: return a spark context that is suitable for my configs 
     note the ip for the master 
     app name is not that important, just to show off 
    """
    from pyspark.mllib.util import MLUtils
    from pyspark import SparkConf
    from pyspark import SparkContext
    import os


    SUBMIT_ARGS = "--driver-class-path /var/nfs/general/mysql-connector-java-5.1.43 pyspark-shell"
    #SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
    conf = SparkConf()
    #conf.set("spark.driver.extraClassPath", "var/nfs/general/mysql-connector-java-5.1.43")
    conf.setMaster(masterAdress)
    conf.setAppName('spark-basic')
    conf.set("spark.executor.memory", "2G")
    #conf.set("spark.executor.cores", "4")
    conf.set("spark.driver.memory", "3G")
    conf.set("spark.driver.cores", "3")
    #conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
    sc = SparkContext(conf=conf)
    print sc._conf.get("spark.executor.extraClassPath")

    return sc


sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=pass', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()

由于

1 个答案:

答案 0 :(得分:2)

从python创建sparkContext时,您的SUBMIT_ARGS将传递给spark-submit。您应该使用--jars代替--driver-class-path

修改

您的问题实际上比看起来简单得多:您在选项中缺少参数driver

sql = SQLContext(sc)
df = sql.read.format('jdbc').options(
    url='jdbc:mysql://ip:port', 
    user='user',
    password='pass',
    driver="com.mysql.jdbc.Driver",
    dbtable='(select * from tablename limit 100) as tablename'
).load()

您还可以将userpassword放在不同的参数中。