清洁日期栏

时间:2019-06-25 15:59:39

标签: sql amazon-athena

我有一个表,其中的日期字段具有以下格式:

Date
01/05/2019 00:00
30/04/2019 00:00
29/04/2019 00:00
26/04/2019 00:00
25/04/2019 00:00
24/04/2019 00:00
17/04/2019 00:00
16/04/2019 00:00
15/04/2019 00:00
09/04/2019 00:00
09/04/2019
01/03/2019
01/02/2019
01/01/2019
01/12/2018
01/11/2018
01/10/2018
01/09/2018
01/08/2018

我如何清洁它们以使其具有所有格式:09/04/2019 00:00

1 个答案:

答案 0 :(得分:0)

您只需反复转换,直到获得所需的格式。 想象一下Amazon Athena提供的具有这种结构的示例表

    elb_logs
        request_timestamp (string)
        elb_name (string)
        request_ip (string)
        request_port (int)
        backend_ip (string)
        backend_port (int)
        request_processing_time (double)
        backend_processing_time (double)
        client_response_time (double)
        elb_response_code (string)
        backend_response_code (string)
        received_bytes (bigint)
        sent_bytes (bigint)
        request_verb (string)
        url (string)
        protocol (string)
        user_agent (string)
        ssl_cipher (string)
        ssl_protocol (string)

您将要在提取函数之后解析,提取,连接并应用条件表达式。请考虑到我必须使用date_parse(),因为我的示例表没有像您一样的日期列

    SELECT
        concat(
          year, '/',
          CASE WHEN cast(month  as integer) between 1 and 9 THEN '0' || month  ELSE month END, '/',
          CASE WHEN cast(day    as integer) between 1 and 9 THEN '0' || day    ELSE day END, ' ',
          CASE WHEN cast(hour   as integer) between 0 and 9 THEN '0' || hour   ELSE hour END, ':',
          CASE WHEN cast(minute as integer) between 0 and 9 THEN '0' || minute ELSE minute END
          ) as "Date"
    FROM (
      SELECT
        cast(extract(YEAR   FROM date_parse(request_timestamp,'%Y-%m-%dT%H:%i:%s.%fZ')) as varchar) as year,
        cast(extract(MONTH  FROM date_parse(request_timestamp,'%Y-%m-%dT%H:%i:%s.%fZ')) as varchar) as month,
        cast(extract(DAY    FROM date_parse(request_timestamp,'%Y-%m-%dT%H:%i:%s.%fZ')) as varchar) as day,
        cast(extract(HOUR   FROM date_parse(request_timestamp,'%Y-%m-%dT%H:%i:%s.%fZ')) as varchar) as hour,
        cast(extract(MINUTE FROM date_parse(request_timestamp,'%Y-%m-%dT%H:%i:%s.%fZ')) as varchar) as minute
      FROM "sampledb"."elb_logs"
      )

这应该使您发布的结果

希望它会有所帮助(: