当我在Intellij中编写代码并使用master(" local")激活2.3.0并在Intellij中执行时给我输出。但, 1)如果我启动spark独立集群(单节点)(。/ start-master.sh给我spark主url和./start-slaves.sh)。并给出sparkSession.master(" spark master url")面临这样的问题:
18/06/03 20:48:49 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.43.2, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
2)我用" sbt assembly"命令.then spark-submit抛出如下所示的错误,即使我使用--master local [7](或)--master spark://ubuntu.ubuntu-domain:7077。
胖JAR创作有什么问题吗?
spark/bin>spark-submit --class apache.sparkSQL.sql_batch \
--msater spark://ubuntu.ubuntu-domain:6066 \
--deploy-mode cluster
/home/user/level_2/spark-2.3.0-bin-hadoop2.7/bin/batch-sparkSQL.jar 10
Exception in thread "main" java.net.ConnectException: Call From ubuntu/127.0.1.1 to ubuntu:9000 failed on connection exception: java.net.ConnectException: Connection refused;
我来到这里。请帮助减少这一点。提前谢谢
我的代码如下:
package apache.sparkSQL
import org.apache.spark.sql.SparkSession
object sql_batch //extends App
{
case class Employee(id:Int,name:String,mobile:String)
case class Car(id:Int,brand:String,model:String)
def main(args:Array[String]):Unit ={
println("hello ! World")
val spark = SparkSession.builder.appName("Spark SQL Demo").master("spark://ubuntu.ubuntu-domain:7077").getOrCreate()
var myrdd1 = spark.sparkContext.textFile("/home/user/IdeaProjects/stream_trend/src/main/resources/batch_data/records.txt")
var myrdd2 = spark.sparkContext.textFile("/home/user/IdeaProjects/stream_trend/src/main/resources/batch_data/another.txt")
//myrdd.foreach(println)
import spark.implicits._
//case class Employee(id:Int,name:String,mobile:String)
val df1 = myrdd1.map(_.split(",")).map(attri => Employee(attri(0).toInt, attri(1), attri(2))).toDF()
val df2 = myrdd2.map(_.split(",")).map(attri => Car(attri(0).toInt,attri(1),attri(2))).toDF()
df1.show()
df2.show()
val jn_df = df1.join(df2,df1.col("id")===df2.col("id"),"right")
jn_df.select(df2("id"),$"name",$"brand").show()
}
}
build.sbt文件是:
name := "stream_trend"
version := "1.0"
scalaVersion := "2.11.8"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
mainClass in assembly := Some("apache.sparkSQL.sql_batch")
resolvers += "spray repo" at "http://repo.spray.io"
assemblyJarName in assembly := "batch-sparkSQL.jar"
libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "2.3.0",
"org.apache.spark" %% "spark-sql" % "2.3.0"
)
输入数据文件是3列数据,如下所示: 11,马鲁蒂,迅速
22,宝马,苯并