当基表日期为date
或timestamp
时,许多查询按周,月或季度显示。
一般来说,在group by
查询中,是否使用是否重要
- 日期的功能
- 具有预先计算的提取的day
表
注意:与DATE lookup table (1990/01/01:2041/12/31)
类似的问题例如,在postgresql
create table sale(
tran_id serial primary key,
tran_dt date not null default current_date,
sale_amt decimal(8,2) not null,
...
);
create table days(
day date primary key,
week date not null,
month date not null,
quarter date non null
);
-- week query 1: group using funcs
select
date_trunc('week',tran_dt)::date - 1 as week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
where date_trunc('week',tran_dt)::date - 1 between '2012-1-1' and '2011-12-31'
group by date_trunc('week',tran_dt)::date - 1
order by 1;
-- query 2: group using days
select
days.week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
join days on( days.day = sale.tran_dt )
where week between '2011-1-1'::date and '2011-12-31'::date
group by week
order by week;
对我来说,date_trunc()
函数似乎更有机,days
表更容易使用。
这里有什么东西不仅仅是品味问题吗?
答案 0 :(得分:2)
-- query 3: group using instant "immediate" calendar table
WITH calender AS (
SELECT ser::date AS dd
, date_trunc('week', ser)::date AS wk
-- , date_trunc('month', ser)::date AS mon
-- , date_trunc('quarter', ser)::date AS qq
FROM generate_series( '2012-1-1' , '2012-12-31', '1 day'::interval) ser
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calender cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
注意:我在BETWEEN范围内修正了一个明显的拼写错误。
更新:我使用Erwin的递归CTE来挤出重复的date_trunc()。嵌套的CTE嘉豪:
WITH calendar AS (
WITH RECURSIVE montag AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 AS dd
FROM montag
WHERE dd < '2012-1-1'::date
)
SELECT mo.dd, date_trunc('week', mo.dd + 1)::date AS wk
FROM montag mo
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calendar cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
答案 1 :(得分:1)
是的,这不仅仅是品味问题。查询的性能取决于方法。
作为第一个近似值,函数应该更快。它们不需要连接,只需在单个表扫描中进行读取。
但是,一个好的优化器可以有效地使用查找表。它会知道目标值的分布。并且,内存连接可能非常快。
作为数据库设计,我认为拥有一个日历表非常有用。诸如假期之类的一些信息不能作为一种功能发挥作用。但是,对于大多数即席查询,日期函数都可以。
答案 2 :(得分:1)
1。你的表达:
...介于&#39; 2012-1-1&#39;和&#39; 2011-12-31&#39;
不起作用。基本BETWEEN
要求左参数小于或等于右参数。必须是:
... BETWEEN SYMMETRIC '2012-1-1' and '2011-12-31'
或者它只是一个错字,你的意思是:
... BETWEEN '2011-1-1' and '2011-12-31'
我不清楚你的查询应该检索什么。我希望 这个表达式在现代硬件上生成的精确度不到一微秒(适用于任何一年):
SELECT generate_series(
date_trunc('week','2010-12-31'::date) + interval '7d'
,date_trunc('week','2011-12-31'::date) + interval '6d'
, '1d')::date
*请注意,&#34;一年中第一周的ISO 8601 definition略有不同。
2。您的第二个查询根本不起作用。没有GROUP BY
?
3. 您链接到的问题没有处理PostgreSQL,后者具有出色的日期/时间戳支持。它有generate_series()
,可以避免需要单独的&#34;天&#34;在大多数情况下表 - 如上所示。您的查询将如下所示:
与此同时@wildplasser provided an example query本来应该去。
受欢迎的*需求,一种递归的CTE版本 - 实际上并不是一个严肃的选择! *和&#34;受欢迎的&#34;我的意思是@wildplasser's very serious request。
WITH RECURSIVE days AS (
SELECT '2011-01-01'::date AS dd
,date_trunc('week', '2011-01-01'::date )::date AS wk
UNION ALL
SELECT dd + 1
,date_trunc('week', dd + 1)::date AS wk
FROM days
WHERE dd < '2011-12-31'::date
)
SELECT d.wk
,count(*) AS sale_ct
,sum(s.sale_amt) AS sale_amt
FROM days d
JOIN sale s ON s.tran_dt = d.dd
-- WHERE d.wk between '2011-01-01' and '2011-12-31'
GROUP BY 1
ORDER BY 1;
也可以写成(compare to @wildplasser's version):
WITH RECURSIVE d AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 FROM d WHERE dd < '2011-12-31'::date
), days AS (
SELECT dd, date_trunc('week', dd + 1)::date AS wk
FROM d
)
SELECT ...
4. 如果性能至关重要,请确保不要将函数或计算应用于表的值。这禁止使用索引,并且通常非常慢,因为必须处理每一行。这就是为什么你的第一个查询会被大桌子吮吸的原因。如果可能,请将计算应用于您过滤的值。
Indexes on expressions是解决这个问题的唯一方法。如果你有像
这样的索引CREATE INDEX sale_tran_dt_week_idx ON sale (date_trunc('week', tran_dt)::date);
..你的第一个查询可能会再次非常快 - 为索引维护的写入操作付出一些代价。