这是我在Scala中编写的代码
package normalisation
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.hadoop.fs.{FileSystem,Path}
object Seasonality {
val amplitude_list_c1: Array[Nothing] = Array()
val amplitude_list_c2: Array[Nothing] = Array()
def main(args: Array[String]){
val conf = new SparkConf().setAppName("Normalization")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val line = "MP"
val ps = "Test"
val location = "hdfs://ipaddress/user/hdfs/{0}/ps/{1}/FS/2018-10-17".format(line,ps)
val files = FileSystem.get(sc.hadoopConfiguration ).listStatus(new Path(location))
for (each <- files) {
var ps_data = sqlContext.read.json(each)
}
println(ps_data.show())
}
使用sbt软件包编译时收到的错误在这里image
这是我的build.sbt文件
名称:=“ OV”
scalaVersion:=“ 2.11.8”
// https://mvnrepository.com/artifact/org.apache.spark/spark-core libraryDependencies + =“ org.apache.spark” %%“ spark-core”%“ 2.3.1”
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql libraryDependencies + =“ org.apache.spark” %%“ spark-sql”%“ 2.3.1”
答案 0 :(得分:0)
在Spark Versions> 2中,通常应使用SparkSession
。
参见https://spark.apache.org/docs/2.3.1/api/scala/#org.apache.spark.sql.SparkSession
然后您就应该可以
val spark:SparkSession = ???
val location = "hdfs://ipaddress/user/hdfs/{0}/ps/{1}/FS/2018-10-17".format(line,ps)
spark.read.json(location)
读取目录中的所有json文件。
另外,我认为您还会在
处遇到另一个编译错误for (each <- files) {
var ps_data = sqlContext.read.json(each)
}
println(ps_data.show())
ps_data
超出范围。
如果出于某种原因需要使用SparkContext
,则它的确应该在核心。您是否尝试过重新启动IDE,清理缓存等?
编辑:我只是注意到build.sbt
可能不在您从中调用sbt package
的目录中,因此sbt不会将其提取