tl; dr:我想在Redshift中生成一个日期表,以便更容易生成报告。最好不需要已经在redshift中的大表,需要上传一个csv文件。
长版: 我正在编写一份报告,我必须平均每周创建的新项目。日期范围可能会持续数月或更长时间,因此可能会有5个星期一但只有4个星期日,这可能会使数学变得有点棘手。此外,我不保证每天有一个项目的实例,特别是一旦用户开始切片数据。其中,这会使BI工具瘫痪。
解决此问题的最佳方法很可能是日期表。但是,日期表的大多数教程都使用Redshift不可用或不完全支持的SQL命令(我在看你, generate_series )。
有一种简单的方法可以在Redshift中生成日期表吗?
我尝试使用的代码:(基于此无效工作建议:http://elliot.land/post/building-a-date-dimension-table-in-redshift)
CREATE TABLE facts.dates (
"date_id" INTEGER NOT NULL PRIMARY KEY,
-- DATE
"full_date" DATE NOT NULL,
-- YEAR
"year_number" SMALLINT NOT NULL,
"year_week_number" SMALLINT NOT NULL,
"year_day_number" SMALLINT NOT NULL,
-- QUARTER
"qtr_number" SMALLINT NOT NULL,
-- MONTH
"month_number" SMALLINT NOT NULL,
"month_name" CHAR(9) NOT NULL,
"month_day_number" SMALLINT NOT NULL,
-- WEEK
"week_day_number" SMALLINT NOT NULL,
-- DAY
"day_name" CHAR(9) NOT NULL,
"day_is_weekday" SMALLINT NOT NULL,
"day_is_last_of_month" SMALLINT NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;
INSERT INTO facts.dates
(
"date_id"
,"full_date"
,"year_number"
,"year_week_number"
,"year_day_number"
-- QUARTER
,"qtr_number"
-- MONTH
,"month_number"
,"month_name"
,"month_day_number"
-- WEEK
,"week_day_number"
-- DAY
,"day_name"
,"day_is_weekday"
,"day_is_last_of_month"
)
SELECT
cast(seq + 1 AS INTEGER) AS date_id,
-- DATE
datum AS full_date,
-- YEAR
cast(extract(YEAR FROM datum) AS SMALLINT) AS year_number,
cast(extract(WEEK FROM datum) AS SMALLINT) AS year_week_number,
cast(extract(DOY FROM datum) AS SMALLINT) AS year_day_number,
-- QUARTER
cast(to_char(datum, 'Q') AS SMALLINT) AS qtr_number,
-- MONTH
cast(extract(MONTH FROM datum) AS SMALLINT) AS month_number,
to_char(datum, 'Month') AS month_name,
cast(extract(DAY FROM datum) AS SMALLINT) AS month_day_number,
-- WEEK
cast(to_char(datum, 'D') AS SMALLINT) AS week_day_number,
-- DAY
to_char(datum, 'Day') AS day_name,
CASE WHEN to_char(datum, 'D') IN ('1', '7')
THEN 0
ELSE 1 END AS day_is_weekday,
CASE WHEN
extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
INTERVAL '1' MONTH) :: DATE -
INTERVAL '1' DAY) = extract(DAY FROM datum)
THEN 1
ELSE 0 END AS day_is_last_of_month
FROM
-- Generate days for 81 years starting from 2000.
(
SELECT
'2000-01-01' :: DATE + generate_series AS datum,
generate_series AS seq
FROM generate_series(0,81 * 365 + 20,1)
) DQ
ORDER BY 1;
会抛出此错误
[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
1 statement failed.
...因为,我认为,Redshift中的同一命令中不允许使用INSERT和 generate_series
答案 0 :(得分:1)
作为解决方法,您可以在本地计算机上旋转Postgres实例,在那里运行代码,导出到CSV,然后仅在Redshift中运行CREATE TABLE
部分并从CSV加载数据。由于这是一次性操作,所以可以这样做,这就是我实际为新的Redshift部署所做的事情。
答案 1 :(得分:1)
以下是构建facts.numbers
的另一种建议,不需要手动干预:
Cross join
对该表进行足够多次以获取所需的行数row_number() over (order by 1)
将创建的记录转换为一组递增的数字使用Redshift系统表pg_catalog.pg_operator
(截至2020年10月有659条记录)的示例:
-- Prep, so that you can copy/paste the code sample
create schema if not exists facts; -- Make sure the schema exists
drop table if exists facts.numbers; -- Avoid an error if that table already exists;
create table facts.numbers -- Create the table definition
(
number int primary key
);
-- The bit you care about
insert into facts.numbers
select row_number() over (order by 1) -- return 1..n in place of the original record
from pg_catalog.pg_operator a -- 659 records
cross join pg_catalog.pg_operator b -- to get 659^2=434k records
cross join pg_catalog.pg_operator c -- to get 659^3=286M records
limit 2000000 -- to limit the result to a reasonable size
;
答案 2 :(得分:0)
在提问时,我想出来了。糟糕。
我从一个“事实”架构开始。
CREATE SCHEMA facts;
运行以下命令以启动数字表:
create table facts.numbers
(
number int PRIMARY KEY
)
;
使用此选项生成您的号码列表。我用了一百万来开始
SELECT ',(' || generate_series(0,1000000,1) || ')'
;
然后在VALUES:
之后,将结果中的数字复制粘贴到下面的查询中INSERT INTO facts.numbers
VALUES
(0)
,(1)
,(2)
,(3)
,(4)
,(5)
,(6)
,(7)
,(8)
,(9)
-- etc
^确保从复制粘贴的数字列表中删除前导逗号
一旦你有一个数字表,那么你可以生成一个日期表(再次,从elliot land http://elliot.land/post/building-a-date-dimension-table-in-redshift窃取代码):
CREATE TABLE facts.dates (
"date_id" INTEGER NOT NULL PRIMARY KEY,
-- DATE
"full_date" DATE NOT NULL,
-- YEAR
"year_number" SMALLINT NOT NULL,
"year_week_number" SMALLINT NOT NULL,
"year_day_number" SMALLINT NOT NULL,
-- QUARTER
"qtr_number" SMALLINT NOT NULL,
-- MONTH
"month_number" SMALLINT NOT NULL,
"month_name" CHAR(9) NOT NULL,
"month_day_number" SMALLINT NOT NULL,
-- WEEK
"week_day_number" SMALLINT NOT NULL,
-- DAY
"day_name" CHAR(9) NOT NULL,
"day_is_weekday" SMALLINT NOT NULL,
"day_is_last_of_month" SMALLINT NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;
INSERT INTO facts.dates
(
"date_id"
,"full_date"
,"year_number"
,"year_week_number"
,"year_day_number"
-- QUARTER
,"qtr_number"
-- MONTH
,"month_number"
,"month_name"
,"month_day_number"
-- WEEK
,"week_day_number"
-- DAY
,"day_name"
,"day_is_weekday"
,"day_is_last_of_month"
)
SELECT
cast(seq + 1 AS INTEGER) AS date_id,
-- DATE
datum AS full_date,
-- YEAR
cast(extract(YEAR FROM datum) AS SMALLINT) AS year_number,
cast(extract(WEEK FROM datum) AS SMALLINT) AS year_week_number,
cast(extract(DOY FROM datum) AS SMALLINT) AS year_day_number,
-- QUARTER
cast(to_char(datum, 'Q') AS SMALLINT) AS qtr_number,
-- MONTH
cast(extract(MONTH FROM datum) AS SMALLINT) AS month_number,
to_char(datum, 'Month') AS month_name,
cast(extract(DAY FROM datum) AS SMALLINT) AS month_day_number,
-- WEEK
cast(to_char(datum, 'D') AS SMALLINT) AS week_day_number,
-- DAY
to_char(datum, 'Day') AS day_name,
CASE WHEN to_char(datum, 'D') IN ('1', '7')
THEN 0
ELSE 1 END AS day_is_weekday,
CASE WHEN
extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
INTERVAL '1' MONTH) :: DATE -
INTERVAL '1' DAY) = extract(DAY FROM datum)
THEN 1
ELSE 0 END AS day_is_last_of_month
FROM
-- Generate days for 81 years starting from 2000.
(
SELECT
'2000-01-01' :: DATE + number AS datum,
number AS seq
FROM facts.numbers
WHERE number between 0 and 81 * 365 + 20
) DQ
ORDER BY 1;
^务必在最后设置所需日期范围的数字