使用sbt

时间:2018-11-14 10:41:49

标签: scala apache-spark sbt

这是我在Scala中编写的代码

package normalisation

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import  org.apache.hadoop.fs.{FileSystem,Path}


object Seasonality {
 val amplitude_list_c1: Array[Nothing] = Array()
 val amplitude_list_c2: Array[Nothing] = Array()
 def main(args: Array[String]){
   val conf = new SparkConf().setAppName("Normalization")
   val sc = new SparkContext(conf)
   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
   val line = "MP"
   val ps = "Test"
   val location = "hdfs://ipaddress/user/hdfs/{0}/ps/{1}/FS/2018-10-17".format(line,ps)
   val files = FileSystem.get(sc.hadoopConfiguration ).listStatus(new Path(location))
   for (each <- files) {
     var ps_data = sqlContext.read.json(each)

   }
   println(ps_data.show())
 }

使用sbt软件包编译时收到的错误在这里image

这是我的build.sbt文件

名称:=“ OV”

scalaVersion:=“ 2.11.8”

// https://mvnrepository.com/artifact/org.apache.spark/spark-core libraryDependencies + =“ org.apache.spark” %%“ spark-core”%“ 2.3.1”

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql libraryDependencies + =“ org.apache.spark” %%“ spark-sql”%“ 2.3.1”

1 个答案:

答案 0 :(得分:0)

在Spark Versions> 2中,通常应使用SparkSession。 参见https://spark.apache.org/docs/2.3.1/api/scala/#org.apache.spark.sql.SparkSession

然后您就应该可以

val spark:SparkSession = ???
val location = "hdfs://ipaddress/user/hdfs/{0}/ps/{1}/FS/2018-10-17".format(line,ps)
spark.read.json(location)

读取目录中的所有json文件。

另外,我认为您还会在

处遇到另一个编译错误
for (each <- files) {
  var ps_data = sqlContext.read.json(each)
}
println(ps_data.show())

ps_data超出范围。

如果出于某种原因需要使用SparkContext,则它的确应该在核心。您是否尝试过重新启动IDE,清理缓存等?

编辑:我只是注意到build.sbt可能不在您从中调用sbt package的目录中,因此sbt不会将其提取