如何在spark-submit中正确使用--exclude-packages选项?

时间:2019-04-10 23:56:32

标签: spark-submit

我有一个基于Spark的流应用程序,该应用程序使用命令行中的spark-submit命令在AWS EMR上运行。我通过使用spark-submit的--packages选项来包含一些依赖项。但是,我也想在spark-submit解决依赖关系时排除一个依赖关系。为此,我尝试使用spark-submit的--exclude-packages选项,但存在问题。我在实际应用程序中看到的错误与以下命令产生的错误相同(也在AWS EMR上运行):

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --num-executors 1  --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.3.2,org.apache.spark:spark-streaming-kinesis-asl_2.11:2.3.2 --exclude-packages com.amazonaws:amazon-kinesis-client:1.7.3  /usr/lib/spark/examples/jars/spark-examples_2.11-2.3.2.jar 10

我看到的错误如下:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: com.amazonaws:amazon-kinesis-client:1.7.3:*
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1015)
    at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1013)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
    at org.apache.spark.deploy.SparkSubmitUtils$.extractMavenCoordinates(SparkSubmit.scala:1013)
    at org.apache.spark.deploy.SparkSubmitUtils$.createExclusion(SparkSubmit.scala:1324)
    at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$resolveMavenCoordinates$1.apply(SparkSubmit.scala:1298)
    at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$resolveMavenCoordinates$1.apply(SparkSubmit.scala:1297)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1297)
    at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:53)
    at org.apache.spark.deploy.SparkSubmit$.doPrepareSubmitEnvironment(SparkSubmit.scala:364)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:250)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:171)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我确实认为我正确给出了amazon-kinesis-client的Maven坐标,因为,首先,它对我来说看起来是正确的,其次,如果我从命令中删除--exclude-packages并添加了Amazon-Maven的Maven坐标, kinesis-client就--packages选项而言,该命令运行正常,请参见以下运行正常的命令:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --num-executors 1  --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.3.2,org.apache.spark:spark-streaming-kinesis-asl_2.11:2.3.2,com.amazonaws:amazon-kinesis-client:1.7.3 /usr/lib/spark/examples/jars/spark-examples_2.11-2.3.2.jar 10

所以我不确定使用--exclude-packages选项时我在做什么错。这可能是提交火花的错误吗?以前有人遇到过这个问题吗?

P.S。我确实在JIRA中搜索了Spark项目,以查找与上述问题相关的所有打开/关闭的问题,但未找到任何结果。

2 个答案:

答案 0 :(得分:0)

尝试:*:amazon-kinesis-client:*

为了进一步理解如何使用spark-submit,最好的资源实际上是源代码:https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

答案 1 :(得分:0)

使用未定义版本的 --exclude-packages com.amazonaws:amazon-kinesis-client