在sparksql中进行unixtime转换后意外的错误结果

时间:2018-02-18 02:57:41

标签: scala dataframe unix-timestamp

我有一个包含以下内容的数据框:

scala> patDF.show
+---------+-------+-----------+-------------+
|patientID|   name|dateOtBirth|lastVisitDate|
+---------+-------+-----------+-------------+
|     1001|Ah Teck| 1991-12-31|   2012-01-20|
|     1002|  Kumar| 2011-10-29|   2012-09-20|
|     1003|    Ali| 2011-01-30|   2012-10-21|
+---------+-------+-----------+-------------+

所有列都是字符串

我想获取 lastVisitDate 的记录列表,其格式为 yyyy-mm-dd ,现在,这里是脚本:

patDF.registerTempTable("patients") 
val results2 = sqlContext.sql("SELECT * FROM patients WHERE from_unixtime(unix_timestamp(lastVisitDate, 'yyyy-mm-dd')) between '2012-09-15' and current_timestamp() order by lastVisitDate")
results2.show() 

它没有任何东西,据推测,应该有患者ID为1002和1003的记录。

所以我将查询修改为:

val results3 = sqlContext.sql("SELECT from_unixtime(unix_timestamp(lastVisitDate, 'yyyy-mm-dd')), * FROM patients")
results3.show() 

现在我明白了:

+-------------------+---------+-------+-----------+-------------+
|                _c0|patientlD|   name|dateOtBirth|lastVisitDate|
+-------------------+---------+-------+-----------+-------------+
|2012-01-20 00:01:00|     1001|Ah Teck| 1991-12-31|   2012-01-20|
|2012-01-20 00:09:00|     1002|  Kumar| 2011-10-29|   2012-09-20|
|2012-01-21 00:10:00|     1003|    Ali| 2011-01-30|   2012-10-21|
+-------------------+---------+-------+-----------+-------------+

如果查看第一列,您会看到所有月份都以某种方式更改为01

代码出了什么问题?

1 个答案:

答案 0 :(得分:0)

year-month-day的正确格式应为yyyy-MM-dd

val patDF = Seq(
  (1001, "Ah Teck", "1991-12-31", "2012-01-20"),
  (1002, "Kumar", "2011-10-29", "2012-09-20"),
  (1003, "Ali", "2011-01-30", "2012-10-21")
)toDF("patientID", "name", "dateOtBirth", "lastVisitDate")

patDF.createOrReplaceTempView("patTable")

val result1 = spark.sqlContext.sql("""
  select * from patTable where to_timestamp(lastVisitDate, 'yyyy-MM-dd')
    between '2012-09-15' and current_timestamp() order by lastVisitDate
""")

result1.show
// +---------+-----+-----------+-------------+
// |patientID| name|dateOtBirth|lastVisitDate|
// +---------+-----+-----------+-------------+
// |     1002|Kumar| 2011-10-29|   2012-09-20|
// |     1003|  Ali| 2011-01-30|   2012-10-21|
// +---------+-----+-----------+-------------+

如果需要,您还可以使用DataFrame API:

val result2 = patDF.where(to_timestamp($"lastVisitDate", "yyyy-MM-dd").
    between(to_timestamp(lit("2012-09-15"), "yyyy-MM-dd"), current_timestamp())
  ).orderBy($"lastVisitDate")