在Windows 10计算机上使用spark-submit执行spark作业时出现错误。命令是:
c:\workspaces\Spark2Demo>spark-submit --class retail_db.GetRevenuePerOrder --master local .\target\scala-2.12\spark2demo_2.12-0.1.jar c:\workspaces\data\retail_db\orders\part-00000 c:\workspaces\output
我得到的错误是:
2019-03-12 19:09:33 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: 'c:\workspaces\data\retail_db\orders\part-00000'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
at retail_db.GetRevenuePerOrder$.main(GetRevenuePerOrder.scala:7)
at retail_db.GetRevenuePerOrder.main(GetRevenuePerOrder.scala)
该文件存在并且可以访问。我可以在IDE中运行该程序。以下是程序图:
package retail_db
import org.apache.spark.{SparkConf,SparkContext}
object GetRevenuePerOrder {
def main(args:Array[String]):Unit = {
val conf = new SparkConf().setMaster(args(0)).setAppName("GetRevenuePerOrder")
val sc = new SparkContext(conf)
sc.setLogLevel("DEBUG")
println(args)
val orderItems = sc.textFile(args(1))
val revenuePerOrder = orderItems.map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toFloat)).reduceByKey(_ + _).map(oi => (oi._1 + "," + oi._2))
revenuePerOrder.saveAsTextFile(args(2))
}
}
请帮助。
答案 0 :(得分:1)
您要两次设置Master。第一次在spark-submit命令(--master local)中将其设置为local,第二次在SparkConf中(新SparkConf()。setMaster(args(0)))。如spark configuration page中所述,“直接在SparkConf上设置的属性具有最高优先级,然后将标志传递到spark-submit或spark-shell,然后是spark-defaults.conf文件中的选项”,本地主文件由spark-submit会被SparkConf参数覆盖。请删除第二部分。