我正在尝试在spark项目上使用sbt程序集。 sbt编译和打包工作,但是当我尝试sbt组装时出现以下错误:
object spark不是软件包org.apache的成员
我已包含spark核心和spark sql库,并且在我的插件文件中具有sbt-assembly。为什么程序集会产生这些错误?
build.sbt:
name := "redis-record-loader"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.1"
val scalatestVersion = "3.0.3"
val scalatest = "org.scalatest" %% "scalatest" % scalatestVersion
libraryDependencies ++=
Seq(
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.347",
"com.typesafe" % "config" % "1.3.1",
"net.debasishg" %% "redisclient" % "3.0",
"org.slf4j" % "slf4j-log4j12" % "1.7.12",
"org.apache.commons" % "commons-lang3" % "3.0" % "test,it",
"org.apache.hadoop" % "hadoop-aws" % "2.8.1" % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.mockito" % "mockito-core" % "2.21.0" % Test,
scalatest
)
val integrationTestsKey = "it"
val integrationTestLibs = scalatest % integrationTestsKey
lazy val IntegrationTestConfig = config(integrationTestsKey) extend Test
lazy val root = project.in(file("."))
.configs(IntegrationTestConfig)
.settings(inConfig(IntegrationTestConfig)(Defaults.testSettings): _*)
.settings(libraryDependencies ++= Seq(integrationTestLibs))
test in assembly := Seq(
(test in Test).value,
(test in IntegrationTestConfig).value
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
plugins.sbt:
logLevel := Level.Warn
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
完整的错误消息:
/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:11: object spark is not a member of package org.apache
[error] import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:26: not found: type SparkSession
[error] implicit val spark: SparkSession = SparkSession.builder
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:26: not found: value SparkSession
[error] implicit val spark: SparkSession = SparkSession.builder
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:51: not found: type DataFrame
[error] val testDataframe0: DataFrame = testData0.toDF()
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:51: value toDF is not a member of Seq[(String, String)]
[error] val testDataframe0: DataFrame = testData0.toDF()
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:52: not found: type DataFrame
[error] val testDataframe1: DataFrame = testData1.toDF()
[error] ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:52: value toDF is not a member of Seq[(String, String)]
[error] val testDataframe1: DataFrame = testData1.toDF()
[error] ^
[error] missing or invalid dependency detected while loading class file 'RedisRecordLoader.class'.
[error] Could not access term spark in package org.apache,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'RedisRecordLoader.class' was compiled against an incompatible version of org.apache.
[error] missing or invalid dependency detected while loading class file 'RedisRecordLoader.class'.
[error] Could not access type SparkSession in value org.apache.sql,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'RedisRecordLoader.class' was compiled against an incompatible version of org.apache.sql.
[error] 9 errors found
答案 0 :(得分:0)
我不能说“我怀疑AWS开发工具包SDK和hadoop-aws版本是否可以正常工作”。您需要hadoop-aws的确切版本来匹配CP上hadoop-common的JAR(毕竟,这是一个同步发布的项目),而针对的aws SDK版本是1.10。 AWS开发工具包的习惯是(a)在每个发行版上都破坏API(b)即使它们不兼容,也要积极地降低新版本的杰克逊,并且(c)导致hadoop-aws代码退化。
如果您真的想使用S3A,最好使用hadoop-2.9,该版本可提供带阴影的1.11.x版本