我有一个Postgres 9.1数据库。我试图生成每周的记录数(对于给定的日期范围)并将其与上一年进行比较。
我有以下代码用于生成系列:
select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
但是,我不知道如何将计算的记录加入到生成的日期。
因此,使用以下记录作为示例:
Pt_ID exam_date
====== =========
1 2012-01-02
2 2012-01-02
3 2012-01-08
4 2012-01-08
1 2013-01-02
2 2013-01-02
3 2013-01-03
4 2013-01-04
1 2013-01-08
2 2013-01-10
3 2013-01-15
4 2013-01-24
我希望将记录返回为:
series thisyr lastyr
=========== ===== =====
2013-01-01 4 2
2013-01-08 3 2
2013-01-15 1 0
2013-01-22 1 0
2013-01-29 0 0
不确定如何在子搜索中引用日期范围。感谢您的帮助。
答案 0 :(得分:3)
简单的方法是通过@jpw演示的CROSS JOIN来解决这个问题。但是,有一些隐藏的问题:
无条件CROSS JOIN
的效果随着行数的增加而迅速恶化。在聚合中处理这个巨大的派生表之前,总行数乘以您要测试的周数。索引无法提供帮助。
1月1日开始的几周会导致不一致。 ISO周可能是另一种选择。见下文。
以下所有问题都会大量使用exam_date
上的索引。一定要有一个。
应该更快:
SELECT d.day, d.thisyr
, count(t.exam_date) AS lastyr
FROM (
SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0 -- for 2nd join
, count(t.exam_date) AS thisyr
FROM generate_series('2013-01-01'::date
, '2013-01-31'::date -- last week overlaps with Feb.
, '7 days'::interval) d(day) -- returns timestamp
LEFT JOIN tbl t ON t.exam_date >= d.day::date
AND t.exam_date < d.day::date + 7
GROUP BY d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.day, d.thisyr
ORDER BY d.day;
这是从1月1日开始的几个星期,就像你原来的一样。正如评论的那样,这产生了一些不一致的地方:每周从不同的一天开始,自从我们在年底切断,一年的最后一周只有1或2天(闰年)。
根据要求,请考虑 ISO周,从周一开始,始终为7天。但他们跨越了多年的边界。 Per documentation on EXTRACT()
:
周
当天的一周中的星期数。根据定义(ISO 8601),星期一和星期的第一周开始 年份包含当年的1月4日。换句话说,第一个 一年的星期四是在那一年的第1周。
在ISO定义中,1月初的日期可能是上一年的第52周或第53周的一部分,并且 12月下旬可以成为明年第一周的一部分。对于 例如,
2005-01-01
是2004年第53周的一部分,并且2006-01-01
是2005年第52周的一部分,而2012-12-31
是。isoyear
2013年第一周的一部分。建议使用week
字段与SELECT w AS isoweek , day::text AS thisyr_monday, thisyr_ct , day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct FROM ( SELECT w, day , date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0 , count(t.exam_date) AS thisyr_ct FROM ( SELECT w , date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day FROM generate_series(0, 4) w ) d LEFT JOIN tbl t ON t.exam_date >= d.day AND t.exam_date < d.day + 7 GROUP BY d.w, d.day ) d LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year AND t.exam_date < d.day0 + 7 GROUP BY d.w, d.day, d.day0, d.thisyr_ct ORDER BY d.w, d.day;
一起获得一致的结果。
以上查询用ISO周重写:
date_trunc('week', '2012-01-04'::date)::date
1月4日始终是今年的第一个ISO周。因此,此表达式获取给定年份的第一个ISO周的星期一日期:
EXTRACT()
EXTRACT()
由于ISO周与SELECT w AS isoweek
, COALESCE(thisyr_ct, 0) AS thisyr_ct
, COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM generate_series(1, 5) w
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2013
GROUP BY 1
) t13 USING (w)
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2012
GROUP BY 1
) t12 USING (w);
返回的周数一致,我们可以简化查询。首先,简短而简单的形式:
WITH params AS ( -- enter parameters here, once
SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
, date_trunc('week', '2013-01-04'::date)::date AS this_start
, date_trunc('week', '2014-01-04'::date)::date AS next_start
, 1 AS week_1
, 5 AS week_n -- show weeks 1 - 5
)
SELECT w.w AS isoweek
, p.this_start + 7 * (w - 1) AS thisyr_monday
, COALESCE(t13.ct, 0) AS thisyr_ct
, p.last_start + 7 * (w - 1) AS lastyr_monday
, COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
, generate_series(p.week_1, p.week_n) w(w)
LEFT JOIN (
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.this_start -- only relevant dates
AND t.exam_date < p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.next_start -- don't cross over into next year
GROUP BY 1
) t13 USING (w)
LEFT JOIN ( -- same for last year
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.last_start
AND t.exam_date < p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.this_start
GROUP BY 1
) t12 USING (w);
同样有更多细节并针对性能进行了优化
JOIN LATERAL
在索引支持下,这应该非常快,并且可以很容易地适应选择的间隔。
上次查询中generate_series()
的隐式{{1}}需要 Postgres 9.3 。
答案 1 :(得分:1)
使用cross join
应该可行,我只是要粘贴下面的SQL Fiddle的markdown输出。对于2013-01-08系列来说,你的样本输出似乎不正确:thisyr应该是2,而不是3.这可能不是最好的方法,但是我的Postgresql知识还有很多不足之处。
PostgreSQL 9.2.4架构设置:
CREATE TABLE Table1
("Pt_ID" varchar(6), "exam_date" date);
INSERT INTO Table1
("Pt_ID", "exam_date")
VALUES
('1', '2012-01-02'),('2', '2012-01-02'),
('3', '2012-01-08'),('4', '2012-01-08'),
('1', '2013-01-02'),('2', '2013-01-02'),
('3', '2013-01-03'),('4', '2013-01-04'),
('1', '2013-01-08'),('2', '2013-01-10'),
('3', '2013-01-15'),('4', '2013-01-24');
查询1 :
select
series,
sum (
case
when exam_date
between series and series + '6 day'::interval
then 1
else 0
end
) as thisyr,
sum (
case
when exam_date + '1 year'::interval
between series and series + '6 day'::interval
then 1 else 0
end
) as lastyr
from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series
<强> Results 强>:
| SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 | 4 | 2 |
| January, 08 2013 00:00:00+0000 | 2 | 2 |
| January, 15 2013 00:00:00+0000 | 1 | 0 |
| January, 22 2013 00:00:00+0000 | 1 | 0 |
| January, 29 2013 00:00:00+0000 | 0 | 0 |