我是新手来激发开发并尝试在redhat linux环境中使用sbt构建我的第一个spark2(scala)应用程序。以下是环境细节。
CDH Version: 5.11.0
Apache Spark2: 2.1.0.cloudera1
Scala Version: 2.11.11
Java Version: 1.7.0_101
申请代码:
import org.apache.spark.sql
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.apache.spark.sql
object MySample {
def main(args: Array[String]) {
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
val spark = SparkSession
.builder()
.appName("FirstApplication")
.config("spark.sql.warehouse.dir", warehouseLocation)
.getOrCreate()
val schPer = new StructType(Array(
new StructField("Column1",IntegerType,false),
new StructField("Column2",StringType,true),
new StructField("Column3",StringType,true),
new StructField("Column4",IntegerType,true)
))
val dfPeriod = spark.read.format("csv").option("header",false).schema(schPer).load("/prakash/periodFiles/")
dfPeriod.write.format("csv").save("/prakash/output/dfPeriod")
}
}
使用sbt编译时出现以下错误。
$ sbt
[info] Loading project definition from /home/prakash/project
[info] Set current project to my sample (in build file:/home/prakash/)
> compile
[info] Compiling 2 Scala sources to /home/prakash/target/scala-2.11/classes...
[error] /home/prakash/src/main/scala/my_sample.scala:2: object SparkSession is not a member of package org.apache.spark.sql
[error] import org.apache.spark.sql.SparkSession
[error] ^
[error] /home/prakash/src/main/scala/my_sample.scala:3: object types is not a member of package org.apache.spark.sql
[error] import org.apache.spark.sql.types._
[error] ^
[error] /home/prakash/src/main/scala/my_sample.scala:10: not found: value SparkSession
[error] val spark = SparkSession
[error] ^
[error] /home/prakash/src/main/scala/my_sample.scala:16: not found: type StructType
[error] val schPer = new StructType(Array(
[error] ^
..
..
..
[error] 43 errors found
[error] (compile:compileIncremental) Compilation failed
下面是我对该项目的sbt配置。
name := "my sample"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
答案 0 :(得分:3)
SparkSession是spark-sql工件的一部分,所以你需要在构建配置中使用它:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"