Question

我一直在用火花壳尝试火花。我的所有数据都在sql中。

  I used to include external jars using the --jars flag like /bin/spark-shell --jars /path/to/mysql-connector-java-5.1.23-bin.jar --master spark://sparkmaster.com:7077

  I have included it in class path by changing  the bin/compute-classpath.sh file 
  I was running succesfully with this config.

现在，当我通过jobserver运行独立作业时。我收到以下错误消息

result: {
    "message" : "com.mysql.jdbc.Driver"
    "errorClass" : "java.lang.classNotFoundException"
    "stack" :[.......]
}

我已将jar文件包含在我的local.conf文件中，如下所示。上下文设置{ ..... dependent-jar-uris = [“file：/// absolute / path / to / the / jarfile”] ...... }

Answer 1

所有依赖项都应包含在spark-jobserver应用程序JAR中（例如创建“uber-jar”），或者包含在Spark执行程序的类路径中。我建议配置类路径，因为它更快，并且需要更少的磁盘空间，因为无论何时应用程序运行，都不需要将第三方库依赖项复制到每个工作程序。

以下是在Spark 1.3.1上配置worker（executor）类路径的步骤：

将第三方JAR复制到每个Spark worker和Spark master
将JAR放在每台主机的同一目录中（例如/ home / ec2-user / lib
将以下行添加到Spark master上的Spark /root/spark/conf/spark-defaults.conf文件中：

spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/name-of-your-jar-file.jar

以下是我自己修改使用Stanford NLP库的示例：

spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/stanford-corenlp-3.4.1.jar:/home/ec2-user/lib/stanford-corenlp-3.4.1-models.jar

Answer 2

您的工作人员可能没有/path/to/mysql-connector-java-5.1.23-bin.jar 您可以将所需的依赖项复制到所有spark worker或 Bundle the submitting jar with required dependencies。我用maven来制作罐子。依赖的范围必须是运行时。

Answer 3

curl --data-binary @/PATH/jobs_jar_2.10-1.0.jar 192.168.0.115:8090/jars/job_to_be_registered

用于发布依赖jar

curl -d "" 'http://192.168.0.115:8090/contexts/new_context?dependent-jar-uris=file:///path/dependent.jar'

这适用于jobserver 1.6.1

spark jobserver ERROR classnotfoundexception

3 个答案: