Spark build.sbt文件版本控制

时间:2018-09-17 16:15:56

标签: scala apache-spark version

我很难理解Spark程序的build.sbt文件中的多个版本号。

1. version
2. scalaVersion
3. spark version?
4. revision number.

这些版本之间也具有多种兼容性。 您能否解释一下如何为我的项目确定这些版本。

1 个答案:

答案 0 :(得分:1)

我希望以下SBT行及其评论足以解释您的问题。

// The version of your project itself.
// You can change this value whenever you want,
// e.g. everytime you make a production release.
version := "0.1.0"

// The Scala version your project uses for compile.
// If you use spark, you can only use a 2.11.x version.
// Also, because Spark includes its own Scala in runtime
// I recommend you use the same one;
//you can check which one your Spark instance uses in the spark-shell.
scalaVersion := "2.11.12"

// The spark version the project uses for compile.
// Because you wont generate an uber jar with Spark included,
// but deploy your jar to an spark cluster instance.
// This version must match with the remote one, unless you want weird bugs...
val SparkVersion = "2.3.1"
// Note, I use a val with the Spark version
// to make it easier to include several Spark modules in my project,
// this way, if I want/have to change the Spark version,
// I only have to modify one line,
// and avoid strange erros because I changed some versions, but not others.
// Also note the 'Provided' modifier at the end,
// it indicates SBT that it shouldn't include the Spark bits in the generated jar
// neither in package nor assembly tasks.
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % SparkVersion % Provided,
  "org.apache.spark" %% "spark-sql" % SparkVersion % Provided,
)

// Exclude Scala from the assembly jar, because spark already includes it.
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

您还应该注意SBT版本,即项目中使用的SBT版本。您在“ project / build.properties” 文件中进行设置。

sbt.version=1.2.3

注意: 我使用sbt-assembly插件来生成一个jar,其中包括Spark和Scala之外的所有依赖项。如果您使用其他库,例如MongoSparkConnector,这将很有用。