使用sbt-spark-package插件了解build.sbt

时间:2019-02-20 23:27:27

标签: scala apache-spark sbt spark-graphx sbt-plugin

我是新的scala和SBT构建文件。在入门教程中,应该通过sbt-spark-package插件直接将火花依赖项添加到scala项目中,但出现以下错误:

[error] (run-main-0) java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

请提供资源,以进一步了解可能导致错误的原因,因为我想更全面地了解过程。

代码:

trait SparkSessionWrapper {

  lazy val spark: SparkSession = {
    SparkSession
      .builder()
      .master("local")
      .appName("spark citation graph")
      .getOrCreate()
  }

  val sc = spark.sparkContext

}


import org.apache.spark.graphx.GraphLoader

object Test extends SparkSessionWrapper {

  def main(args: Array[String]) {
    println("Testing, testing, testing, testing...")

    var filePath = "Desktop/citations.txt"
    val citeGraph = GraphLoader.edgeListFile(sc, filepath)
    println(citeGraph.vertices.take(1))
  }
}

plugins.sbt

resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")

build.sbt-工作。为什么libraryDependencies运行/工作?

spName := "yewno/citation_graph"

version := "0.1"

scalaVersion := "2.11.12"

sparkVersion := "2.2.0"

sparkComponents ++= Seq("core", "sql", "graphx")

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.2.0",
  "org.apache.spark" %% "spark-sql" % "2.2.0",
  "org.apache.spark" %% "spark-graphx" % "2.2.0"
)

build.sbt-不起作用。希望它能够正确编译并运行

spName := "yewno/citation_graph"

version := "0.1"

scalaVersion := "2.11.12"

sparkVersion := "2.2.0"

sparkComponents ++= Seq("core", "sql", "graphx")

用于解释的奖励+指向资源的链接,以了解有关SBT构建过程,jar文件以及任何其他可以帮助我快速入门的信息的信息!

1 个答案:

答案 0 :(得分:1)

sbt-spark-package pluginprovided范围内提供依赖项:

sparkComponentSet.map { component =>
  "org.apache.spark" %% s"spark-$component" % sparkVersion.value % "provided"
}.toSeq

我们可以通过从sbt运行show libraryDependencies来确认这一点:

[info] * org.scala-lang:scala-library:2.11.12
[info] * org.apache.spark:spark-core:2.2.0:provided
[info] * org.apache.spark:spark-sql:2.2.0:provided
[info] * org.apache.spark:spark-graphx:2.2.0:provided

provided范围的意思是:

  

依赖关系将是编译和测试的一部分,但不包括在   运行时。

因此sbt run抛出java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

如果我们真的想在provided类路径中包含run依赖项,那么@douglaz建议:

run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated