将格式化的日期字符串转换为Hive中的TIMESTAMP,同时考虑到保存时间

时间:2016-08-11 10:56:46

标签: hadoop hive timestamp type-conversion

我想将以下格式'03/21/2006 10:00:00 PM'的日期转换为Hive中的TIMESTAMP数据类型。字符串来自芝加哥的时区,这意味着适用夏令时。我希望在此过程中将时间转换为UTC。

我尝试过以下操作,但unix_timestamp()无法理解时区America/Chicago

SELECT cast(
   from_unixtime(
     unix_timestamp('03/21/2006 10:00:00 PM America/Chicago', 'MM/dd/yyyy hh:mm:ss a zzzz')
   ) AS TIMESTAMP
);

我可以使用CDTCST作为时区,具体取决于日期是否在dailight节省时间内,但由于时间变化日期因年而异,因此会变得混乱。

有更好的方法吗?

我使用的是Hive 1.1.0。

感谢您的时间。

修改

我已经能够使用以下代码实现我的目标。 解决方案既不简洁也不优雅,但它现在可以完成工作。 UDF可以以更模块化的方式完成相同的工作,但我还没有编写Hive UDF的经验。

,from_utc_timestamp(to_utc_timestamp(concat(
        regexp_extract(date, '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 3) -- year
        ,'-'
        ,regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 1) -- month
        ,'-'
        ,regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 2) -- day
        ,' '
        ,CASE  -- hour
            WHEN regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 6) = 'AM'   
                AND regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 4) = '12' 
                THEN '00'
            WHEN regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 6) = 'AM'   
                AND regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 4) <> '12' 
                THEN regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 4) 
            WHEN regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 6) = 'PM' 
                AND regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 4) = '12' 
                THEN '12'
            WHEN regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 6) = 'PM' 
                AND regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 4) <> '12'
                THEN cast(regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 4) AS TINYINT) + 12
            ELSE NULL -- should never get here
        END
        ,regexp_extract(date,  '(\\d{2})/(\\d{2})/(\\d{4}) (\\d{2})(:\\d{2}:\\d{2}) ([A|P]M)', 5) -- rest of time
        ) 
     , 'America/Chicago')
     , 'America/Chicago') AS cast_date

0 个答案:

没有答案