在Spark scala程序中解析日期时出错

时间:2018-01-09 02:54:33

标签: scala apache-spark apache-spark-sql

我正在通过以下语句解析日期,并且spark程序在执行时抛出错误。

import java.time.LocalDate
val filterDate = "2017-06-26"
val filter = LocalDate.parse(load_dt)

我使用过滤器变量来过滤数据框列中的记录。

val df1 = spark.sql("select * from tableA")
val df2 = df1.filter(load_dt = filter)

执行上述程序时出现以下错误

User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve '(tableA.`load_dt` = ((2017 - 6) - 26))' due to data type mismatch: differing types in '(tableA.`load_dt` = ((2017 - 6) - 26))' (date and int).; line 1 pos 0;

我无法理解为什么它会分别看(2017 - 6)和(-26)。

任何人都可以请你帮忙 感谢

1 个答案:

答案 0 :(得分:0)

Spark的DataFrame API没有java.time.LocalDate的编码器,但由于您的load_dt列已经是java.sql.Date类型,因此您无需转换filterDateLocalDate。另一个问题是===应该用于filter条件中的相等性检查,如下例所示:

val filterDate = "2017-06-26"

val df1 = Seq(
  (1, java.sql.Date.valueOf("2017-06-25")),
  (2, java.sql.Date.valueOf("2017-06-26")),
  (3, java.sql.Date.valueOf("2017-06-27"))
).toDF("id", "load_dt")

val df2 = df1.filter($"load_dt" === filterDate)

df2.show
+---+----------+
| id|   load_dt|
+---+----------+
|  2|2017-06-26|
+---+----------+

[UPDATE]

根据您的评论,如果filterColumnfilterValue作为变量提供:

使用Spark SQL:

df1.createOrReplaceTempView("df1Table")

val tableName = "df1Table"
val filterColumn = "load_dt"
val filterValue = "2017-06-26"

val df2 = spark.sqlContext.sql(
  "select * from " + tableName + " where " + filterColumn + " = '" + filterValue + "'"
)

// Alternative: use string interpolation `s"..."`
val df2 = spark.sqlContext.sql(
  s"select * from $tableName where $filterColumn = '$filterValue'"
)

使用Spark DataFrame API:

val df2 = df1.select("*").where(col(filterColumn) === filterValue)