Hive日期字符串验证

时间:2017-10-06 16:22:20

标签: hive hql unix-timestamp

我正在尝试检查字符串的有效日期格式是否为“YYYYMMDD”。

我正在使用以下技术。但是对于无效的日期字符串,我得到了有效的日期结果。

我做错了什么?

SELECT'20019999',CASE WHEN unix_timestamp('20019999','YYYYMMDD')  > 0 THEN  'Good'ELSE 'Bad'END;

1 个答案:

答案 0 :(得分:3)

首先,您使用的格式错误

select  from_unixtime(unix_timestamp())                 as default_format
       ,from_unixtime(unix_timestamp(),'YYYY-MM-DD')    as wrong_format
       ,from_unixtime(unix_timestamp(),'yyyy-MM-dd')    as right_format
;
+----------------------+---------------+---------------+
|    default_format    | wrong_format  | right_format  |
+----------------------+---------------+---------------+
| 2017-10-07 04:13:26  | 2017-10-280   | 2017-10-07    |
+----------------------+---------------+---------------+

其次,日期部件范围没有验证 如果您将日期部分增加1,则会将您转发到第二天。

with t as (select stack(7,'27','28','29','30','31','32','33') as dy)
select  t.dy
       ,from_unixtime(unix_timestamp(concat('2017-02-',t.dy),'yyyy-MM-dd'),'yyyy-MM-dd') as dt

from    t
;
+-----+-------------+
| dy  |     dt      |
+-----+-------------+
| 27  | 2017-02-27  |
| 28  | 2017-02-28  |
| 29  | 2017-03-01  |
| 30  | 2017-03-02  |
| 31  | 2017-03-03  |
| 32  | 2017-03-04  |
| 33  | 2017-03-05  |
+-----+-------------+

如果您将月份部分增加1,则会将您转到下个月。

with t as (select stack(5,'10','11','12','13','14') as mn)
select  t.mn
       ,from_unixtime(unix_timestamp(concat('2017-',t.mn,'-01'),'yyyy-MM-dd'),'yyyy-MM-dd') as dt

from    t
;
+-----+-------------+
| mn  |     dt      |
+-----+-------------+
| 10  | 2017-10-01  |
| 11  | 2017-11-01  |
| 12  | 2017-12-01  |
| 13  | 2018-01-01  |
| 14  | 2018-02-01  |
+-----+-------------+

即使使用CAST,验证仅在零件范围内进行,而不是在日期本身进行。

select cast('2010-02-32' as date);
+-------+
|  _c0  |
+-------+
| NULL  |
+-------+
select cast('2010-02-29' as date);
+-------------+
|     _c0     |
+-------------+
| 2010-03-01  |
+-------------+

以下是实现目标的方法:

with t as (select '20019999' as dt)
select  dt  
       ,from_unixtime(unix_timestamp(dt,'yyyyMMdd'),'yyyyMMdd') as double_converted_dt    

       ,case 
            when from_unixtime(unix_timestamp(dt,'yyyyMMdd'),'yyyyMMdd')  = dt 
            then 'Good' 
            else 'Bad' 
        end             as dt_status

from    t
;
+-----------+----------------------+------------+
|    dt     | double_converted_dt  | dt_status  |
+-----------+----------------------+------------+
| 20019999  | 20090607             | Bad        |
+-----------+----------------------+------------+