我使用以下Docker命令运行spark:
docker run -it \
-p 8088:8088 -p 8042:8042 -p 50070:50070 \
-v "$(PWD)"/log4j.properties:/usr/local/spark/conf/log4j.properties \
-v "$(PWD)":/app -h sandbox sequenceiq/spark:1.6.0 bash
正在运行spark-submit --version
报告版本1.6.0
我的spark-submit
命令如下:
spark-submit --class io.jobi.GithubDay \
--master local[*] \
--name "Daily Github Push Counter" \
/app/min-spark_2.11-1.0.jar \
"file:///app/data/github-archive/*.json" \
"/app/data/ghEmployees.txt" \
"file:///app/data/emp-gh-push-output" "json"
build.sbt
name := """min-spark"""
version := "1.0"
scalaVersion := "2.11.7"
lazy val sparkVersion = "1.6.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
)
// Change this to another test framework if you prefer
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.4" % "test"
GithubDay.scala
package io.jobi
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import scala.io.Source.fromFile
/**
* Created by hammer on 7/15/16.
*/
object GithubDay {
def main(args: Array[String]): Unit = {
println("Application arguments: ")
args.foreach(println)
val conf = new SparkConf()
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
try {
println("args(0): " + args(0))
val ghLog = sqlContext.read.json(args(0))
val pushes = ghLog.filter("type = 'PushEvent'")
val grouped = pushes.groupBy("actor.login").count()
val ordered = grouped.orderBy(grouped("count").desc)
val employees = Set() ++ (
for {
line <- fromFile(args(1)).getLines()
} yield line.trim
)
val bcEmployees = sc.broadcast(employees)
import sqlContext.implicits._
println("register function")
val isEmployee = sqlContext.udf.register("SetContainsUdf", (u: String) => bcEmployees.value.contains(u))
println("registered udf")
val filtered = ordered.filter(isEmployee($"login"))
println("applied filter")
filtered.write.format(args(3)).save(args(2))
} finally {
sc.stop()
}
}
}
我使用sbt clean package
构建,但运行时的输出是:
Application arguments:
file:///app/data/github-archive/*.json
/app/data/ghEmployees.txt
file:///app/data/emp-gh-push-output
json
args(0): file:///app/data/github-archive/*.json
imported implicits
defined isEmp
register function
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
at io.jobi.GithubDay$.main(GithubDay.scala:53)
at io.jobi.GithubDay.main(GithubDay.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我所阅读的内容NoSuchMethodError
是版本不兼容的结果,但我正在使用1.6.0
构建并部署到1.6.0
,因此我不知道&#39;了解发生了什么。
答案 0 :(得分:2)
除非您自己编译Spark,否则开箱即用的版本1.6.0是使用Scala 2.10.x编译的。 This is stated in the docs(表示1.6.2,但也与1.6.0相关):
Spark运行在Java 7 +,Python 2.6+和R 3.1+上。对于Scala API, Spark 1.6.2使用Scala 2.10。您需要使用兼容的Scala 版本(2.10.x)。
你想:
scalaVersion := "2.10.6"
一个提示是错误发生在Scala类中:scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)