我正在尝试使用" spark.jars"在spark工作中添加我的自定义jar。属性。 虽然我可以在添加jar的日志中读取信息但是当我检查添加到类路径的jar时,我找不到它。下面是我也尝试过的函数。
1)spark.jars
2)spark.driver.extraLibraryPath
3)spark.executor.extraLibraryPath
4)setJars(SEQ [字符串])
但是没有人添加jar。我在HDP中使用spark 2.2.0并且文件保存在本地。 请让我知道我可能做错了什么。
第一个选项对我有用.Spark.jars正在添加jar,因为它在Spark UI中显示。
答案 0 :(得分:0)
检查documentation for submitting jobs,在底部添加额外的非运行时jar
您可以将广告罐添加到SparkConf中的class Project:
# Class objects support two kinds of operations:
## Attribute references (skip to this part) and
## Instantiation (probably not what you asked about but is essential).
# Instantiation ("calling" a class object) creates an empty object with instances usually customized to a specific initial state.
## A special widely used method (called a class constructor in other programming languages such as C++) is:
### __init__()
## and here's an example of it:
def __init__(self):
self.data = []
# When a class defines an __init__() method,
# class instantiation automatically invokes __init__() for the newly-created class instance.
# So in this example, a new, initialized instance can be obtained by:
# x = Project()
# Read more about __init__() in "9.3.3. Instance Objects" in the link provided.
#Attribute references such as:
object1 = "First object in Project"
object2 = 3.14666420
def functionOfProject(self):
return "Project's function was accessed!"
#------- testing the code above in a main function ------#
def main():
# These will NOT work because no Project object was created:
## print(Project.object1)
## print(Project.object2)
# We can create a Project object named "project1" and invoke it's attributes as in the following:
project1 = Project()
print(project1.object1)
print(project1.object2)
project1.functionOfProject
main()
或在运行时指定它们
spark.jars
所以试试
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
例如,我有一个pyspark脚本spark-submit --master yarn --jars the_jar_i_need.jar my_script.py
需要一个jar,kafak_consumer.py
要运行它,命令是
spark-streaming-kafka-0-8-assembly_2.11-2.1.1.jar
答案 1 :(得分:0)
If you need an external jar available to the executors, you can try spark.executor.extraClassPath
. According to the documentation it shouldn't be necessary, but it helped me in the past
Extra classpath entries to prepend to the classpath of executors. This exists primarily for backwards-compatibility with older versions of Spark. Users typically should not need to set this option.
Documentation: https://spark.apache.org/docs/latest/configuration.html