在Spark中运行第一个程序

时间:2017-04-30 03:23:58

标签: scala apache-spark

我正在尝试使用scala在Spark中运行我的第一个程序。尝试读取csv文件并显示。

代码:

import org.apache.spark.sql.SparkSession
import org.apache.spark._
import java.io._
import org.apache.spark.SparkContext._
import org.apache.log4j._

object df extends App{

 val spark=SparkSession.builder().getOrCreate()
 val drf=spark.read.csv("C:/Users/admin/Desktop/scala-datasets/Scala-and-
 Spark-Bootcamp-master/Spark DataFrames/CitiGroup2006_2008")
 drf.head(5)
}

收到以下错误:

   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    17/04/29 23:10:53 INFO SparkContext: Running Spark version 2.1.0
    17/04/29 23:10:56 WARN NativeCodeLoader: Unable to load native-hadoop 
    library for your platform... using builtin-java classes where applicable
    17/04/29 23:10:57 ERROR SparkContext: Error initializing SparkContext.
    org.apache.spark.SparkException: A master URL must be set in your 
    configuration at org.apache.spark.SparkContext.<init>
    (SparkContext.scala:379)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at df$.delayedEndpoint$df$1(df.scala:11)
at df$delayedInit$body.apply(df.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at df$.main(df.scala:9)
at df.main(df.scala)

任何建议都会有所帮助

2 个答案:

答案 0 :(得分:0)

您错过了.master()函数调用。例如,如果您想在local mode中运行以下是解决方案:

object df extends App{
 val spark=SparkSession.builder().master("local").getOrCreate()
 val drf=spark.read.csv("C:/Users/admin/Desktop/scala-datasets/Scala-and-
 Spark-Bootcamp-master/Spark DataFrames/CitiGroup2006_2008")
 drf.head(5)
}

错误日志清楚地说明了

17/04/29 23:10:57 ERROR SparkContext: Error initializing SparkContext.
    org.apache.spark.SparkException: A master URL must be set in your 
    configuration at org.apache.spark.SparkContext.<init>
    (SparkContext.scala:379)

希望有所帮助

答案 1 :(得分:0)

正如之前的评论所说,你应该为你的spark上下文设置master,在你的情况下它应该是local [1]或local [*]。您还应该设置appName。 您可以使用带键的spark-submit通过代码避免使用master和appName规范。

import org.apache.spark.sql.SparkSession

object df extends App{
  override def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().appName("example").master("local[*]").getOrCreate()
    val drf = spark.read.csv("C:/Users/admin/Desktop/scala-datasets/Scala-and-Spark-Bootcamp-master/Spark DataFrames/CitiGroup2006_2008")
    drf.head(5)
  }
}