环境:
OS: CentOS 7
Java: 1.8
Spark: 2.4.5
Hadoop: 2.7.7
Scala: 2.12.11
Hardware: 3 computers
我构建了一个简单的scala应用程序。我的代码是:
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object wordCount {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("WordCount")
val context: SparkContext = new SparkContext(conf)
val lines: RDD[String] = context.textFile(args(0))
val words: RDD[String] = lines.flatMap(_.split(" "))
val tuples: RDD[(String, Int)] = words.map((_, 1))
val sumed: RDD[(String, Int)] = tuples.reduceByKey(_ + _)
val sorted: RDD[(String, Int)] = sumed.sortBy(_._2, false)
sorted.saveAsTextFile(args(1))
context.stop()
}
}
build.sbt文件为:
name := "SparkScalaTest2"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5"
我的目录布局如下:
$ find .
.
./build.sbt
./src
./src/main
./src/main/scala
./src/main/scala/wordCount.scala
然后我用sbt package
来打包一个jar,然后使用以下命令在Spark上提交应用程序:
spark-submit \
--class wordCount \
--master spark://master:7077 \
--executor-memory 512M \
--total-executor-cores 3 \
/home/spark/IdeaProjects/SparkScalaTest2/target/scala-2.12/sparkscalatest2_2.12-0.1.jar \
hdfs://master:9000/data/text.txt \
hdfs://master:9000/result/wordCount
但是我在bash shell中遇到了错误:
20/07/20 10:11:18 INFO spark.SparkContext: Created broadcast 0 from textFile at wordCount.scala:39
Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
at wordCount$.main(wordCount.scala:45)
at wordCount.main(wordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
... 14 more
Caused by: java.lang.ClassNotFoundException: scala.runtime.java8.JFunction2$mcIII$sp
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 14 more
为应用程序分配的执行程序的Stderr日志页面显示以下错误:
20/07/20 10:11:48 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIG
我在Internet上进行搜索,发现scala的版本可能导致此问题。但是Spark(http://spark.apache.org/docs/2.4.5/index.html)的官方网站却说:
Spark可在Java 8,Python 2.7 + / 3.4 +和R 3.1+上运行。对于Scala API,Spark 2.4.5使用Scala 2.12。您将需要使用兼容的Scala版本(2.12.x)。
我不知道为什么。无论如何,我尝试按照spark-shell的建议安装scala-2.11.12并设置环境变量。提交应用程序后,bash shell中未显示任何错误,但是为应用程序分配的执行程序的stderr日志页面显示以下错误:
20/07/21 14:13:12 INFO executor.Executor: Finished task 1.0 in stage 4.0 (TID 7). 1459 bytes result sent to driver
20/07/21 14:13:12 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
20/07/21 14:13:12 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
tdown
尽管出现错误,看来该应用程序仍可以成功运行,这次我得到了正确的输出。
在查看项目结构时,我发现了目录./project/target/scala-2.12
和./target/scala-2.11
。为什么两个目录建议使用不同版本的scala?这是我出问题的原因吗?我该如何解决这个问题?