如何在Redshift中创建日期表?

时间:2017-11-10 20:21:29

标签: sql date amazon-redshift

tl; dr:我想在Redshift中生成一个日期表,以便更容易生成报告。最好不需要已经在redshift中的大表,需要上传一个csv文件。

长版: 我正在编写一份报告,我必须平均每周创建的新项目。日期范围可能会持续数月或更长时间,因此可能会有5个星期一但只有4个星期日,这可能会使数学变得有点棘手。此外,我不保证每天有一个项目的实例,特别是一旦用户开始切片数据。其中,这会使BI工具瘫痪。

解决此问题的最佳方法很可能是日期表。但是,日期表的大多数教程都使用Redshift不可用或不完全支持的SQL命令(我在看你, generate_series )。

有一种简单的方法可以在Redshift中生成日期表吗?

我尝试使用的代码:(基于此无效工作建议:http://elliot.land/post/building-a-date-dimension-table-in-redshift

CREATE TABLE facts.dates (
  "date_id"              INTEGER                     NOT NULL PRIMARY KEY,

  -- DATE
  "full_date"            DATE                        NOT NULL,

  -- YEAR
  "year_number"          SMALLINT                    NOT NULL,
  "year_week_number"     SMALLINT                    NOT NULL,
  "year_day_number"      SMALLINT                    NOT NULL,

  -- QUARTER
  "qtr_number"           SMALLINT                    NOT NULL,

  -- MONTH
  "month_number"         SMALLINT                    NOT NULL,
  "month_name"           CHAR(9)                     NOT NULL,
  "month_day_number"     SMALLINT                    NOT NULL,

  -- WEEK
  "week_day_number"      SMALLINT                    NOT NULL,

  -- DAY
  "day_name"             CHAR(9)                     NOT NULL,
  "day_is_weekday"       SMALLINT                    NOT NULL,
  "day_is_last_of_month" SMALLINT                    NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;


INSERT INTO facts.dates
(
   "date_id"
  ,"full_date"
  ,"year_number"
  ,"year_week_number"
  ,"year_day_number"

  -- QUARTER
  ,"qtr_number"

  -- MONTH
  ,"month_number"
  ,"month_name"
  ,"month_day_number"

  -- WEEK
  ,"week_day_number"

  -- DAY
  ,"day_name"
  ,"day_is_weekday"
  ,"day_is_last_of_month"
)
  SELECT
    cast(seq + 1 AS INTEGER)                                      AS date_id,

    -- DATE
    datum                                                         AS full_date,

    -- YEAR
    cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
    cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
    cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,

    -- QUARTER
    cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,

    -- MONTH
    cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
    to_char(datum, 'Month')                                       AS month_name,
    cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,

    -- WEEK
    cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,

    -- DAY
    to_char(datum, 'Day')                                         AS day_name,
    CASE WHEN to_char(datum, 'D') IN ('1', '7')
      THEN 0
    ELSE 1 END                                                    AS day_is_weekday,
    CASE WHEN
      extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                        INTERVAL '1' MONTH) :: DATE -
                       INTERVAL '1' DAY) = extract(DAY FROM datum)
      THEN 1
    ELSE 0 END                                                    AS day_is_last_of_month
  FROM
    -- Generate days for 81 years starting from 2000.
    (
      SELECT
        '2000-01-01' :: DATE + generate_series AS datum,
        generate_series                        AS seq
      FROM generate_series(0,81 * 365 + 20,1)
    ) DQ
  ORDER BY 1;

会抛出此错误

[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
1 statement failed.

...因为,我认为,Redshift中的同一命令中不允许使用INSERT和 generate_series

3 个答案:

答案 0 :(得分:1)

作为解决方法,您可以在本地计算机上旋转Postgres实例,在那里运行代码,导出到CSV,然后仅在Redshift中运行CREATE TABLE部分并从CSV加载数据。由于这是一次性操作,所以可以这样做,这就是我实际为新的Redshift部署所做的事情。

答案 1 :(得分:1)

以下是构建facts.numbers的另一种建议,不需要手动干预:

  1. 获取已知或稳定大小的系统表(保证存在)
  2. Cross join对该表进行足够多次以获取所需的行数
  3. 选择row_number() over (order by 1)将创建的记录转换为一组递增的数字

使用Redshift系统表pg_catalog.pg_operator(截至2020年10月有659条记录)的示例:

-- Prep, so that you can copy/paste the code sample
create schema if not exists facts;   -- Make sure the schema exists
drop table if exists facts.numbers;  -- Avoid an error if that table already exists;
create table facts.numbers           -- Create the table definition
(
  number int primary key
);

-- The bit you care about
insert into facts.numbers
    select     row_number() over (order by 1) -- return 1..n in place of the original record
    from       pg_catalog.pg_operator a       -- 659 records
    cross join pg_catalog.pg_operator b       -- to get 659^2=434k records 
    cross join pg_catalog.pg_operator c       -- to get 659^3=286M records
    limit      2000000                        -- to limit the result to a reasonable size
;

答案 2 :(得分:0)

在提问时,我想出来了。糟糕。

我从一个“事实”架构开始。

CREATE SCHEMA facts;

运行以下命令以启动数字表:

create table facts.numbers
(
  number int PRIMARY KEY
)
;

使用此选项生成您的号码列表。我用了一百万来开始

SELECT ',(' || generate_series(0,1000000,1) || ')'
;

然后在VALUES:

之后,将结果中的数字复制粘贴到下面的查询中
INSERT INTO facts.numbers
VALUES
 (0)
,(1)
,(2)
,(3)
,(4)
,(5)
,(6)
,(7)
,(8)
,(9)
-- etc

^确保从复制粘贴的数字列表中删除前导逗号

一旦你有一个数字表,那么你可以生成一个日期表(再次,从elliot land http://elliot.land/post/building-a-date-dimension-table-in-redshift窃取代码):

CREATE TABLE facts.dates (
  "date_id"              INTEGER                     NOT NULL PRIMARY KEY,

  -- DATE
  "full_date"            DATE                        NOT NULL,

  -- YEAR
  "year_number"          SMALLINT                    NOT NULL,
  "year_week_number"     SMALLINT                    NOT NULL,
  "year_day_number"      SMALLINT                    NOT NULL,

  -- QUARTER
  "qtr_number"           SMALLINT                    NOT NULL,

  -- MONTH
  "month_number"         SMALLINT                    NOT NULL,
  "month_name"           CHAR(9)                     NOT NULL,
  "month_day_number"     SMALLINT                    NOT NULL,

  -- WEEK
  "week_day_number"      SMALLINT                    NOT NULL,

  -- DAY
  "day_name"             CHAR(9)                     NOT NULL,
  "day_is_weekday"       SMALLINT                    NOT NULL,
  "day_is_last_of_month" SMALLINT                    NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;


INSERT INTO facts.dates
(
   "date_id"
  ,"full_date"
  ,"year_number"
  ,"year_week_number"
  ,"year_day_number"

  -- QUARTER
  ,"qtr_number"

  -- MONTH
  ,"month_number"
  ,"month_name"
  ,"month_day_number"

  -- WEEK
  ,"week_day_number"

  -- DAY
  ,"day_name"
  ,"day_is_weekday"
  ,"day_is_last_of_month"
)
  SELECT
    cast(seq + 1 AS INTEGER)                                      AS date_id,

    -- DATE
    datum                                                         AS full_date,

    -- YEAR
    cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
    cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
    cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,

    -- QUARTER
    cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,

    -- MONTH
    cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
    to_char(datum, 'Month')                                       AS month_name,
    cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,

    -- WEEK
    cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,

    -- DAY
    to_char(datum, 'Day')                                         AS day_name,
    CASE WHEN to_char(datum, 'D') IN ('1', '7')
      THEN 0
    ELSE 1 END                                                    AS day_is_weekday,
    CASE WHEN
      extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                        INTERVAL '1' MONTH) :: DATE -
                       INTERVAL '1' DAY) = extract(DAY FROM datum)
      THEN 1
    ELSE 0 END                                                    AS day_is_last_of_month
  FROM
    -- Generate days for 81 years starting from 2000.
    (
      SELECT
        '2000-01-01' :: DATE + number AS datum,
        number                        AS seq
      FROM facts.numbers
      WHERE number between 0 and 81 * 365 + 20
    ) DQ
  ORDER BY 1;

^务必在最后设置所需日期范围的数字