scala - 无法创建SparkContext和SparkSession

时间:2018-04-06 05:24:57

标签: scala apache-spark scala-ide

我是scala和Spark的新手。我正在尝试读取csv文件,因此我创建了一个SparkSession来读取csv。我还创建了一个SparkContext,以便稍后使用rdd。我正在使用scala-ide。

出现的错误可能是java中的常见错误,但我无法解决它们。

代码:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._



object Solution1 {
  def main(args: Array[String]){

    println("Create contex for rdd ")
    val conf = new SparkConf().setAppName("Problem1")
    val cont = new SparkContext(conf)

    println("create SparkSession and read csv")
    val spark = SparkSession.builder().appName("Problem1").getOrCreate()
    val data = spark.read.option("header", false).csv("file.csv")


    // further processing


   cont.stop()  
  }

} 

错误:

Create contex for rdd 
Exception in thread "main" java.lang.NoClassDefFoundError: org/spark_project/guava/cache/CacheLoader
    at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:73)
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:68)
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:55)
    at Solution1$.main(Solution1.scala:13)
    at Solution1.main(Solution1.scala)
Caused by: java.lang.ClassNotFoundException: org.spark_project.guava.cache.CacheLoader
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more

1 个答案:

答案 0 :(得分:-1)

请创建Spark Context,如下所示

def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("someName").setMaster("local[*]")
    val sparkContext = new SparkContext(conf)
}

阅读数据

val rdd = sparkContext.textFile("path.csv")

和Spark Session如下所示

def main(args: Array[String]): Unit = {
    val spark = SparkSession
                .builder()
                .appName("Creating spark session")
                .master("local[*]")
                .getOrCreate()
}

读取数据通话

val df = spark.read.format("json").load("path.json")

此外,如果您有spark会话创建,那么您不需要单独创建Spark上下文,您可以像这样调用Spark会话以利用Spark上下文:

val data = spark.sparkContext.textFile("path")