我正在通过以下语句解析日期,并且spark程序在执行时抛出错误。
import java.time.LocalDate
val filterDate = "2017-06-26"
val filter = LocalDate.parse(load_dt)
我使用过滤器变量来过滤数据框列中的记录。
val df1 = spark.sql("select * from tableA")
val df2 = df1.filter(load_dt = filter)
执行上述程序时出现以下错误
User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve '(tableA.`load_dt` = ((2017 - 6) - 26))' due to data type mismatch: differing types in '(tableA.`load_dt` = ((2017 - 6) - 26))' (date and int).; line 1 pos 0;
我无法理解为什么它会分别看(2017 - 6)和(-26)。
任何人都可以请你帮忙 感谢
答案 0 :(得分:0)
Spark的DataFrame API没有java.time.LocalDate
的编码器,但由于您的load_dt
列已经是java.sql.Date
类型,因此您无需转换filterDate
到LocalDate
。另一个问题是===
应该用于filter
条件中的相等性检查,如下例所示:
val filterDate = "2017-06-26"
val df1 = Seq(
(1, java.sql.Date.valueOf("2017-06-25")),
(2, java.sql.Date.valueOf("2017-06-26")),
(3, java.sql.Date.valueOf("2017-06-27"))
).toDF("id", "load_dt")
val df2 = df1.filter($"load_dt" === filterDate)
df2.show
+---+----------+
| id| load_dt|
+---+----------+
| 2|2017-06-26|
+---+----------+
[UPDATE]
根据您的评论,如果filterColumn
和filterValue
作为变量提供:
使用Spark SQL:
df1.createOrReplaceTempView("df1Table")
val tableName = "df1Table"
val filterColumn = "load_dt"
val filterValue = "2017-06-26"
val df2 = spark.sqlContext.sql(
"select * from " + tableName + " where " + filterColumn + " = '" + filterValue + "'"
)
// Alternative: use string interpolation `s"..."`
val df2 = spark.sqlContext.sql(
s"select * from $tableName where $filterColumn = '$filterValue'"
)
使用Spark DataFrame API:
val df2 = df1.select("*").where(col(filterColumn) === filterValue)