我是scala和Spark的新手。我正在尝试读取csv文件,因此我创建了一个SparkSession来读取csv。我还创建了一个SparkContext,以便稍后使用rdd。我正在使用scala-ide。
出现的错误可能是java中的常见错误,但我无法解决它们。
代码:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
object Solution1 {
def main(args: Array[String]){
println("Create contex for rdd ")
val conf = new SparkConf().setAppName("Problem1")
val cont = new SparkContext(conf)
println("create SparkSession and read csv")
val spark = SparkSession.builder().appName("Problem1").getOrCreate()
val data = spark.read.option("header", false).csv("file.csv")
// further processing
cont.stop()
}
}
错误:
Create contex for rdd
Exception in thread "main" java.lang.NoClassDefFoundError: org/spark_project/guava/cache/CacheLoader
at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:73)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:68)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:55)
at Solution1$.main(Solution1.scala:13)
at Solution1.main(Solution1.scala)
Caused by: java.lang.ClassNotFoundException: org.spark_project.guava.cache.CacheLoader
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
答案 0 :(得分:-1)
请创建Spark Context,如下所示
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("someName").setMaster("local[*]")
val sparkContext = new SparkContext(conf)
}
阅读数据
val rdd = sparkContext.textFile("path.csv")
和Spark Session如下所示
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("Creating spark session")
.master("local[*]")
.getOrCreate()
}
读取数据通话
val df = spark.read.format("json").load("path.json")
此外,如果您有spark会话创建,那么您不需要单独创建Spark上下文,您可以像这样调用Spark会话以利用Spark上下文:
val data = spark.sparkContext.textFile("path")