spark和breeze随机数生成器(ClassNotFoundException)

时间:2015-02-09 16:15:47

标签: scala sbt apache-spark dependency-management

我在本教程后获得的单台机器上安装了spark:

这是我目前的应用

  /* SimpleApp.scala */
import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import breeze.linalg.linspace
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.random._
import org.apache.spark.rdd.RDD
import breeze._
import org.apache.spark.mllib.linalg.{Matrix, Matrices, Vectors, Vector}
import org.apache.commons.math3.random.RandomDataGenerator


    object SimpleApp {

    /*
      def make_y(x: RowMatrix) = 
        {
          val xx = x map(u=>u*u)
          val s = xx(::, 0) + xx(::, 1) map(u=>u + 0.000000001)
          breeze.numerics.sin(s)/s
        }

     */
      def main(args: Array[String]) {

        val conf = new SparkConf().setAppName("Simple Application")
        val sc = new SparkContext(conf)

        val numPartitions = 1
        val n = 100
        val p = 2
        val n_nodes = 3
        val x0 = Vector(Vectors.dense(breeze.linalg.linspace(-3, 3, n).toArray),  
                Vectors.dense(breeze.linalg.linspace(-3, 3, n).toArray))
        val xRows = sc.parallelize(x0)
        val xDist = new RowMatrix(xRows, n, p)

        //val y = make_y(xDist)


        val unif = breeze.stats.distributions.Uniform(-1,1)
        val w0 = unif.samplesVector(p*n_nodes).toArray

        //val w0 = breeze.linalg.linspace(-1, 1, p).toArray ++ breeze.linalg.linspace(-1, 1, p).toArray ++ breeze.linalg.linspace(-1, 1, p).toArray 
        val w: Matrix = Matrices.dense(p, n_nodes, w0)

        val xw = xDist.multiply(w)


        println("xw shape = ", xw.numRows, xw.numCols)
        println("FINISH!")

      }

    }

我用以下代码编译代码:sbt assembly。 当我尝试将代码提交给spark时出现错误

donbeo@donbeo-OptiPlex-790:~/Applications/spark-1.1.0$ ./bin/spark-submit --class "SimpleApp" --master local[4] /home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-app-assembly.jar 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/09 16:11:29 WARN Utils: Your hostname, donbeo-OptiPlex-790 resolves to a loopback address: 127.0.1.1; using 149.157.140.205 instead (on interface eth0)
15/02/09 16:11:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/02/09 16:11:29 INFO SecurityManager: Changing view acls to: donbeo,
15/02/09 16:11:29 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/09 16:11:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/09 16:11:29 INFO Slf4jLogger: Slf4jLogger started
15/02/09 16:11:29 INFO Remoting: Starting remoting
15/02/09 16:11:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@149.157.140.205:59075]
15/02/09 16:11:29 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@149.157.140.205:59075]
15/02/09 16:11:29 INFO Utils: Successfully started service 'sparkDriver' on port 59075.
15/02/09 16:11:29 INFO SparkEnv: Registering MapOutputTracker
15/02/09 16:11:30 INFO SparkEnv: Registering BlockManagerMaster
15/02/09 16:11:30 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150209161130-d6e2
15/02/09 16:11:30 INFO Utils: Successfully started service 'Connection manager for block manager' on port 48412.
15/02/09 16:11:30 INFO ConnectionManager: Bound socket to port 48412 with id = ConnectionManagerId(149.157.140.205,48412)
15/02/09 16:11:30 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/09 16:11:30 INFO BlockManagerMaster: Trying to register BlockManager
15/02/09 16:11:30 INFO BlockManagerMasterActor: Registering block manager 149.157.140.205:48412 with 265.4 MB RAM
15/02/09 16:11:30 INFO BlockManagerMaster: Registered BlockManager
15/02/09 16:11:30 INFO HttpFileServer: HTTP File server directory is /tmp/spark-8b51b92a-cc95-4c8b-9575-470e877f3e0c
15/02/09 16:11:30 INFO HttpServer: Starting HTTP Server
15/02/09 16:11:30 INFO Utils: Successfully started service 'HTTP file server' on port 41924.
15/02/09 16:11:30 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/09 16:11:30 INFO SparkUI: Started SparkUI at http://149.157.140.205:4040
15/02/09 16:11:30 INFO SparkContext: Added JAR file:/home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-app-assembly.jar at http://149.157.140.205:41924/jars/simple-app-assembly.jar with timestamp 1423498290575
15/02/09 16:11:30 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@149.157.140.205:59075/user/HeartbeatReceiver
15/02/09 16:11:30 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
15/02/09 16:11:30 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator
    at breeze.stats.distributions.Uniform$.apply$default$3(Uniform.scala:10)
    at SimpleApp$.main(SimpleApp.scala:41)
    at SimpleApp.main(SimpleApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.math3.random.RandomGenerator
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 10 more
donbeo@donbeo-OptiPlex-790:~/Applications/spark-1.1.0$ 

编辑1: 这就是我运行代码的方式

./bin/spark-submit  --class "SimpleApp"  --master local[4]  /home/donbeo/Documents/scala_code/simpleApp/target/scala-2.10/simple-app-assembly.jar

编辑2:

这是我的构建文件。

import AssemblyKeys._

import sbtassembly.Plugin._

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark"  %% "spark-mllib"  % "1.1.0" % "provided"

libraryDependencies += "org.apache.commons" % "commons-math3" % "3.1.1"


libraryDependencies  ++= Seq(
            // other dependencies here
            "org.scalanlp" %% "breeze" % "0.10",
            // native libraries are not included by default. add this if you want them (as of 0.7)
            // native libraries greatly improve performance, but increase jar sizes.
            "org.scalanlp" %% "breeze-natives" % "0.10"
)

resolvers ++= Seq(
            // other resolvers here
            // if you want to use snapshot builds (currently 0.11-SNAPSHOT), use this.
            "Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/",
            "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"
)



// This statement includes the assembly plugin capabilities
assemblySettings

// Configure jar named used with the assembly plug-in
jarName in assembly := "simple-app-assembly.jar"

// A special option to exclude Scala itself form our assembly jar, since Spark
// already bundles Scala.
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

我该如何解决?

1 个答案:

答案 0 :(得分:0)

似乎sbt处理库依赖项,但不会将它们添加到主程序包中。所以我的答案可能不是最佳解决方案,但这就是我解决类似问题的方法。

  1. 我将我在sbt中定义的依赖项的jar库下载到了org.apache.spark中的一个导向器lib

  2. 我在需要提交申请时明确添加了以下内容:

    ./bin/spark-submit \
    --class ... \
    --master  ... \
    --jars $(echo ./lib/*.jar | tr ' ' ',') \
    target/scala-2.10/application.jar
    
  3. 我想你可以用自己的类名,应用程序等连接点。

    我希望这会有所帮助!