SBT包scala脚本

时间:2016-08-26 07:59:31

标签: scala apache-spark sbt

我正在尝试使用scala脚本使用spark提交,但首先我需要创建我的包。

这是我的sbt文件:

name := "Simple Project"    
version := "1.0"    
scalaVersion := "2.10.4"    
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0"

当我尝试 sbt package 时,我收到了这些错误:

/home/i329537/Scripts/PandI/SBT/src/main/scala/XML_Script_SBT.scala:3: object functions is not a member of package org.apache.spark.sql
import org.apache.spark.sql.functions._
                            ^
/home/i329537/Scripts/PandI/SBT/src/main/scala/XML_Script_SBT.scala:4: object types is not a member of package org.apache.spark.sql
import org.apache.spark.sql.types._
                            ^
/home/i329537/Scripts/PandI/SBT/src/main/scala/XML_Script_SBT.scala:25: not found: value sc
        val hconf = SparkHadoopUtil.get.newConfiguration(sc.getConf)
                                                         ^
/home/i329537/Scripts/PandI/SBT/src/main/scala/XML_Script_SBT.scala:30: not found: value sqlContext
        val df = sqlContext.read.format("xml").option("attributePrefix","").option("rowTag", "project").load(uri.toString())
                 ^
/home/i329537/Scripts/PandI/SBT/src/main/scala/XML_Script_SBT.scala:36: not found: value udf
        val sqlfunc = udf(coder)
                      ^
5 errors found
(compile:compileIncremental) Compilation failed

有人遇到这些错误吗?

感谢您的帮助。

此致 马吉德

2 个答案:

答案 0 :(得分:0)

您正在尝试使用课程org.apache.spark.sql.functions并打包org.apache.spark.sql.types。根据{{​​3}}类文档,它可从1.3.0版开始提供。从版本1.3.1开始,types包可用。

解决方案:将SBT文件更新为:

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1"

其他错误:"未找到:值sc","未找到:value sqlContext","未找到:value udf"是由XML_Script_SBT.scala文件中的一些缺少的varibales引起的。无需查看源代码就无法解决。

答案 1 :(得分:0)

谢谢谢尔盖,你的纠正纠正了3个错误。以下是我的剧本:

object SimpleApp {


  def main(args: Array[String]) {

    val today = Calendar.getInstance.getTime
    val curTimeFormat = new SimpleDateFormat("yyyyMMdd-HHmmss")
    val time = curTimeFormat.format(today)
    val destination = "/3.Data/3.Code_Check_Processed/2.XML/" + time + ".extensive.csv"

    val source =  "/3.Data/2.Code_Check_Raw/2.XML/Extensive/"

    val hconf = SparkHadoopUtil.get.newConfiguration(sc.getConf)
    val hdfs = FileSystem.get(hconf)
    val iter = hdfs.listLocatedStatus(new Path(source))

    val uri = iter.next.getPath.toUri
    val df = sqlContext.read.format("xml").option("attributePrefix","").option("rowTag", "project").load(uri.toString())

    val df2 = df.selectExpr("explode(check) as e").select("e.#VALUE","e.additionalInformation1","e.columnNumber","e.context","e.errorType","e.filePath","e.lineNumber","e.message","e.severity")

    val coder: (Long => String) = (arg: Long) => {if (arg > -1) time else "nada"}

    val sqlfunc = udf(coder)

    val df3 = df2.withColumn("TimeStamp", sqlfunc(col("columnNumber")))

    df3.write.format("com.databricks.spark.csv").option("header", "false").save(destination)

    hdfs.delete(new Path(uri.toString()), true)

    sys.exit(0)
     }
    }