apache spark standalone cluster - spark-submit - ConnectException:从ubuntu / 127.0.1.1调用到ubuntu:9000失败

时间:2018-06-03 15:05:31

标签: scala apache-spark

当我在Intellij中编写代码并使用master(" local")激活2.3.0并在Intellij中执行时给我输出。但, 1)如果我启动spark独立集群(单节点)(。/ start-master.sh给我spark主url和./start-slaves.sh)。并给出sparkSession.master(" spark master url")面临这样的问题:

18/06/03 20:48:49 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.43.2, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD     at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)

2)我用" sbt assembly"命令.then spark-submit抛出如下所示的错误,即使我使用--master local [7](或)--master spark://ubuntu.ubuntu-domain:7077。

胖JAR创作有什么问题吗?

spark/bin>spark-submit --class apache.sparkSQL.sql_batch \
                       --msater spark://ubuntu.ubuntu-domain:6066 \                   
                       --deploy-mode cluster
                       /home/user/level_2/spark-2.3.0-bin-hadoop2.7/bin/batch-sparkSQL.jar 10

    Exception in thread "main" java.net.ConnectException: Call From ubuntu/127.0.1.1 to ubuntu:9000 failed on connection exception: java.net.ConnectException: Connection refused;

我来到这里。请帮助减少这一点。提前谢谢

我的代码如下:

    package apache.sparkSQL

import org.apache.spark.sql.SparkSession

object sql_batch //extends App
{

  case class Employee(id:Int,name:String,mobile:String)

  case class Car(id:Int,brand:String,model:String)

  def main(args:Array[String]):Unit ={

    println("hello ! World")

    val spark = SparkSession.builder.appName("Spark SQL Demo").master("spark://ubuntu.ubuntu-domain:7077").getOrCreate()
    var myrdd1 = spark.sparkContext.textFile("/home/user/IdeaProjects/stream_trend/src/main/resources/batch_data/records.txt")
    var myrdd2 = spark.sparkContext.textFile("/home/user/IdeaProjects/stream_trend/src/main/resources/batch_data/another.txt")


    //myrdd.foreach(println)

    import spark.implicits._

    //case class Employee(id:Int,name:String,mobile:String)
    val df1 = myrdd1.map(_.split(",")).map(attri => Employee(attri(0).toInt, attri(1), attri(2))).toDF()
    val df2 = myrdd2.map(_.split(",")).map(attri => Car(attri(0).toInt,attri(1),attri(2))).toDF()

    df1.show()
    df2.show()

    val jn_df = df1.join(df2,df1.col("id")===df2.col("id"),"right")

    jn_df.select(df2("id"),$"name",$"brand").show()
   }

}

build.sbt文件是:

    name := "stream_trend"

version := "1.0"

scalaVersion := "2.11.8"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}
mainClass in assembly := Some("apache.sparkSQL.sql_batch")
resolvers += "spray repo" at "http://repo.spray.io"
assemblyJarName in assembly := "batch-sparkSQL.jar"


libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "2.3.0",
  "org.apache.spark" %% "spark-sql" % "2.3.0"

)

输入数据文件是3列数据,如下所示: 11,马鲁蒂,迅速

22,宝马,苯并

0 个答案:

没有答案