在databricks上打包scala类(错误:找不到:值dbutils)

时间:2019-05-24 06:17:08

标签: scala apache-spark databricks

尝试制作带有类的包裹

package x.y.Log


import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.{DataFrame}
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import spark.implicits._

class Log{
...
}

在同一个笔记本上一切运行正常,但是一旦我尝试创建可以在其他笔记本上使用的程序包,就会出现错误:

<notebook>:11: error: not found: object spark
import spark.implicits._
       ^
<notebook>:21: error: not found: value dbutils
  val notebookPath = dbutils.notebook.getContext().notebookPath.get
                     ^
<notebook>:22: error: not found: value dbutils
  val userName = dbutils.notebook.getContext.tags("user")
                 ^
<notebook>:23: error: not found: value dbutils
  val userId = dbutils.notebook.getContext.tags("userId")
               ^
<notebook>:41: error: not found: value spark
    var rawMeta =  spark.read.format("json").option("multiLine", true).load("/FileStore/tables/xxx.json")
                   ^
<notebook>:42: error: value $ is not a member of StringContext
    .filter($"Name".isin(readSources))

有人知道如何使用这些库来打包此类吗?

1 个答案:

答案 0 :(得分:1)

假设您正在运行Spark 2.x,则语句import spark.implicits._仅在合并范围内有SparkSession对象时才有效。对象Implicits是在SparkSession对象内部定义的。该对象从spark Link to SparkSession code on Github的先前版本扩展了SQLImplicits。您可以检查链接进行验证

package x.y.Log


import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession

class Log{

  val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()

  import spark.implicits._

  ...[rest of the code below]
}