我在本教程后获得的单台机器上安装了spark:
这是我目前的应用
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import breeze.linalg.linspace
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.random._
import org.apache.spark.rdd.RDD
import breeze._
import org.apache.spark.mllib.linalg.{Matrix, Matrices, Vectors, Vector}
import org.apache.commons.math3.random.RandomDataGenerator
object SimpleApp {
/*
def make_y(x: RowMatrix) =
{
val xx = x map(u=>u*u)
val s = xx(::, 0) + xx(::, 1) map(u=>u + 0.000000001)
breeze.numerics.sin(s)/s
}
*/
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val numPartitions = 1
val n = 100
val p = 2
val n_nodes = 3
val x0 = Vector(Vectors.dense(breeze.linalg.linspace(-3, 3, n).toArray),
Vectors.dense(breeze.linalg.linspace(-3, 3, n).toArray))
val xRows = sc.parallelize(x0)
val xDist = new RowMatrix(xRows, n, p)
//val y = make_y(xDist)
val unif = breeze.stats.distributions.Uniform(-1,1)
val w0 = unif.samplesVector(p*n_nodes).toArray
//val w0 = breeze.linalg.linspace(-1, 1, p).toArray ++ breeze.linalg.linspace(-1, 1, p).toArray ++ breeze.linalg.linspace(-1, 1, p).toArray
val w: Matrix = Matrices.dense(p, n_nodes, w0)
val xw = xDist.multiply(w)
println("xw shape = ", xw.numRows, xw.numCols)
println("FINISH!")
}
}
我用以下代码编译代码:sbt assembly
。
当我尝试将代码提交给spark时出现错误
donbeo@donbeo-OptiPlex-790:~/Applications/spark-1.1.0$ ./bin/spark-submit --class "SimpleApp" --master local[4] /home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-app-assembly.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/09 16:11:29 WARN Utils: Your hostname, donbeo-OptiPlex-790 resolves to a loopback address: 127.0.1.1; using 149.157.140.205 instead (on interface eth0)
15/02/09 16:11:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/02/09 16:11:29 INFO SecurityManager: Changing view acls to: donbeo,
15/02/09 16:11:29 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/09 16:11:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/09 16:11:29 INFO Slf4jLogger: Slf4jLogger started
15/02/09 16:11:29 INFO Remoting: Starting remoting
15/02/09 16:11:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@149.157.140.205:59075]
15/02/09 16:11:29 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@149.157.140.205:59075]
15/02/09 16:11:29 INFO Utils: Successfully started service 'sparkDriver' on port 59075.
15/02/09 16:11:29 INFO SparkEnv: Registering MapOutputTracker
15/02/09 16:11:30 INFO SparkEnv: Registering BlockManagerMaster
15/02/09 16:11:30 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150209161130-d6e2
15/02/09 16:11:30 INFO Utils: Successfully started service 'Connection manager for block manager' on port 48412.
15/02/09 16:11:30 INFO ConnectionManager: Bound socket to port 48412 with id = ConnectionManagerId(149.157.140.205,48412)
15/02/09 16:11:30 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/09 16:11:30 INFO BlockManagerMaster: Trying to register BlockManager
15/02/09 16:11:30 INFO BlockManagerMasterActor: Registering block manager 149.157.140.205:48412 with 265.4 MB RAM
15/02/09 16:11:30 INFO BlockManagerMaster: Registered BlockManager
15/02/09 16:11:30 INFO HttpFileServer: HTTP File server directory is /tmp/spark-8b51b92a-cc95-4c8b-9575-470e877f3e0c
15/02/09 16:11:30 INFO HttpServer: Starting HTTP Server
15/02/09 16:11:30 INFO Utils: Successfully started service 'HTTP file server' on port 41924.
15/02/09 16:11:30 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/09 16:11:30 INFO SparkUI: Started SparkUI at http://149.157.140.205:4040
15/02/09 16:11:30 INFO SparkContext: Added JAR file:/home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-app-assembly.jar at http://149.157.140.205:41924/jars/simple-app-assembly.jar with timestamp 1423498290575
15/02/09 16:11:30 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@149.157.140.205:59075/user/HeartbeatReceiver
15/02/09 16:11:30 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
15/02/09 16:11:30 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator
at breeze.stats.distributions.Uniform$.apply$default$3(Uniform.scala:10)
at SimpleApp$.main(SimpleApp.scala:41)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.math3.random.RandomGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 10 more
donbeo@donbeo-OptiPlex-790:~/Applications/spark-1.1.0$
编辑1: 这就是我运行代码的方式
./bin/spark-submit --class "SimpleApp" --master local[4] /home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-app-assembly.jar
编辑2:
这是我的构建文件。
import AssemblyKeys._
import sbtassembly.Plugin._
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.1.0" % "provided"
libraryDependencies += "org.apache.commons" % "commons-math3" % "3.1.1"
libraryDependencies ++= Seq(
// other dependencies here
"org.scalanlp" %% "breeze" % "0.10",
// native libraries are not included by default. add this if you want them (as of 0.7)
// native libraries greatly improve performance, but increase jar sizes.
"org.scalanlp" %% "breeze-natives" % "0.10"
)
resolvers ++= Seq(
// other resolvers here
// if you want to use snapshot builds (currently 0.11-SNAPSHOT), use this.
"Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/",
"Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"
)
// This statement includes the assembly plugin capabilities
assemblySettings
// Configure jar named used with the assembly plug-in
jarName in assembly := "simple-app-assembly.jar"
// A special option to exclude Scala itself form our assembly jar, since Spark
// already bundles Scala.
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
我该如何解决?
答案 0 :(得分:0)
似乎sbt处理库依赖项,但不会将它们添加到主程序包中。所以我的答案可能不是最佳解决方案,但这就是我解决类似问题的方法。
我将我在sbt中定义的依赖项的jar库下载到了org.apache.spark中的一个导向器lib
我在需要提交申请时明确添加了以下内容:
./bin/spark-submit \
--class ... \
--master ... \
--jars $(echo ./lib/*.jar | tr ' ' ',') \
target/scala-2.10/application.jar
我想你可以用自己的类名,应用程序等连接点。
我希望这会有所帮助!