Spark.jars没有将jar添加到classpath

时间:2018-03-21 04:36:43

标签: hadoop apache-spark apache-spark-sql spark-streaming

我正在尝试使用" spark.jars"在spark工作中添加我的自定义jar。属性。 虽然我可以在添加jar的日志中读取信息但是当我检查添加到类路径的jar时,我找不到它。下面是我也尝试过的函数。

1)spark.jars

2)spark.driver.extraLibraryPath

3)spark.executor.extraLibraryPath

4)setJars(SEQ [字符串])

但是没有人添加jar。我在HDP中使用spark 2.2.0并且文件保存在本地。 请让我知道我可能做错了什么。

第一个选项对我有用.Spark.jars正在添加jar,因为它在Spark UI中显示。

2 个答案:

答案 0 :(得分:0)

检查documentation for submitting jobs,在底部添加额外的非运行时jar

您可以将广告罐添加到SparkConf中的class Project: # Class objects support two kinds of operations: ## Attribute references (skip to this part) and ## Instantiation (probably not what you asked about but is essential). # Instantiation ("calling" a class object) creates an empty object with instances usually customized to a specific initial state. ## A special widely used method (called a class constructor in other programming languages such as C++) is: ### __init__() ## and here's an example of it: def __init__(self): self.data = [] # When a class defines an __init__() method, # class instantiation automatically invokes __init__() for the newly-created class instance. # So in this example, a new, initialized instance can be obtained by: # x = Project() # Read more about __init__() in "9.3.3. Instance Objects" in the link provided. #Attribute references such as: object1 = "First object in Project" object2 = 3.14666420 def functionOfProject(self): return "Project's function was accessed!" #------- testing the code above in a main function ------# def main(): # These will NOT work because no Project object was created: ## print(Project.object1) ## print(Project.object2) # We can create a Project object named "project1" and invoke it's attributes as in the following: project1 = Project() print(project1.object1) print(project1.object2) project1.functionOfProject main() 或在运行时指定它们

spark.jars

所以试试 ./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \

例如,我有一个pyspark脚本spark-submit --master yarn --jars the_jar_i_need.jar my_script.py需要一个jar,kafak_consumer.py

要运行它,命令是

spark-streaming-kafka-0-8-assembly_2.11-2.1.1.jar

答案 1 :(得分:0)

If you need an external jar available to the executors, you can try spark.executor.extraClassPath. According to the documentation it shouldn't be necessary, but it helped me in the past

Extra classpath entries to prepend to the classpath of executors. This exists primarily for backwards-compatibility with older versions of Spark. Users typically should not need to set this option.

Documentation: https://spark.apache.org/docs/latest/configuration.html