我有一个包含以下内容的数据框:
scala> patDF.show
+---------+-------+-----------+-------------+
|patientID| name|dateOtBirth|lastVisitDate|
+---------+-------+-----------+-------------+
| 1001|Ah Teck| 1991-12-31| 2012-01-20|
| 1002| Kumar| 2011-10-29| 2012-09-20|
| 1003| Ali| 2011-01-30| 2012-10-21|
+---------+-------+-----------+-------------+
所有列都是字符串
我想获取 lastVisitDate 的记录列表,其格式为 yyyy-mm-dd ,现在,这里是脚本:
patDF.registerTempTable("patients")
val results2 = sqlContext.sql("SELECT * FROM patients WHERE from_unixtime(unix_timestamp(lastVisitDate, 'yyyy-mm-dd')) between '2012-09-15' and current_timestamp() order by lastVisitDate")
results2.show()
它没有任何东西,据推测,应该有患者ID为1002和1003的记录。
所以我将查询修改为:
val results3 = sqlContext.sql("SELECT from_unixtime(unix_timestamp(lastVisitDate, 'yyyy-mm-dd')), * FROM patients")
results3.show()
现在我明白了:
+-------------------+---------+-------+-----------+-------------+
| _c0|patientlD| name|dateOtBirth|lastVisitDate|
+-------------------+---------+-------+-----------+-------------+
|2012-01-20 00:01:00| 1001|Ah Teck| 1991-12-31| 2012-01-20|
|2012-01-20 00:09:00| 1002| Kumar| 2011-10-29| 2012-09-20|
|2012-01-21 00:10:00| 1003| Ali| 2011-01-30| 2012-10-21|
+-------------------+---------+-------+-----------+-------------+
如果查看第一列,您会看到所有月份都以某种方式更改为01
代码出了什么问题?
答案 0 :(得分:0)
year-month-day
的正确格式应为yyyy-MM-dd
:
val patDF = Seq(
(1001, "Ah Teck", "1991-12-31", "2012-01-20"),
(1002, "Kumar", "2011-10-29", "2012-09-20"),
(1003, "Ali", "2011-01-30", "2012-10-21")
)toDF("patientID", "name", "dateOtBirth", "lastVisitDate")
patDF.createOrReplaceTempView("patTable")
val result1 = spark.sqlContext.sql("""
select * from patTable where to_timestamp(lastVisitDate, 'yyyy-MM-dd')
between '2012-09-15' and current_timestamp() order by lastVisitDate
""")
result1.show
// +---------+-----+-----------+-------------+
// |patientID| name|dateOtBirth|lastVisitDate|
// +---------+-----+-----------+-------------+
// | 1002|Kumar| 2011-10-29| 2012-09-20|
// | 1003| Ali| 2011-01-30| 2012-10-21|
// +---------+-----+-----------+-------------+
如果需要,您还可以使用DataFrame API:
val result2 = patDF.where(to_timestamp($"lastVisitDate", "yyyy-MM-dd").
between(to_timestamp(lit("2012-09-15"), "yyyy-MM-dd"), current_timestamp())
).orderBy($"lastVisitDate")