我在Postgres中的updates
表是9.4.5,如下所示:
goal_id | created_at | status
1 | 2016-01-01 | green
1 | 2016-01-02 | red
2 | 2016-01-02 | amber
这样的goals
表:
id | company_id
1 | 1
2 | 2
我想为每家公司创建一个图表,每周显示所有目标的状态。
我认为这需要生成一系列过去8周,找到该周之前每个目标的最新更新,然后计算找到的更新的不同状态。
到目前为止我所拥有的:
SELECT EXTRACT(year from generate_series) AS year,
EXTRACT(week from generate_series) AS week,
u.company_id,
COUNT(*) FILTER (WHERE u.status = 'green') AS green_count,
COUNT(*) FILTER (WHERE u.status = 'amber') AS amber_count,
COUNT(*) FILTER (WHERE u.status = 'red') AS red_count
FROM generate_series(NOW() - INTERVAL '2 MONTHS', NOW(), '1 week')
LEFT OUTER JOIN (
SELECT DISTINCT ON(year, week)
goals.company_id,
updates.status,
EXTRACT(week from updates.created_at) week,
EXTRACT(year from updates.created_at) AS year,
updates.created_at
FROM updates
JOIN goals ON goals.id = updates.goal_id
ORDER BY year, week, updates.created_at DESC
) u ON u.week = week AND u.year = year
GROUP BY 1,2,3
但这有两个问题。似乎u
上的联接没有像我想象的那样起作用。它似乎是从内部查询返回的每一行(?)加入,并且这只选择从该周发生的最新更新。如果需要,它应从该周之前获取最新更新。
这是一些相当复杂的SQL,我喜欢关于如何将它拉下来的一些输入。
目标表约有1000个ATM目标,并且每周增加约100个:
Table "goals"
Column | Type | Modifiers
-----------------+-----------------------------+-----------------------------------------------------------
id | integer | not null default nextval('goals_id_seq'::regclass)
company_id | integer | not null
name | text | not null
created_at | timestamp without time zone | not null default timezone('utc'::text, now())
updated_at | timestamp without time zone | not null default timezone('utc'::text, now())
Indexes:
"goals_pkey" PRIMARY KEY, btree (id)
"entity_goals_company_id_fkey" btree (company_id)
Foreign-key constraints:
"goals_company_id_fkey" FOREIGN KEY (company_id) REFERENCES companies(id) ON DELETE RESTRICT
updates
表约有1000左右,每周约有100个增长:
Table "updates"
Column | Type | Modifiers
------------+-----------------------------+------------------------------------------------------------------
id | integer | not null default nextval('updates_id_seq'::regclass)
status | entity.goalstatus | not null
goal_id | integer | not null
created_at | timestamp without time zone | not null default timezone('utc'::text, now())
updated_at | timestamp without time zone | not null default timezone('utc'::text, now())
Indexes:
"goal_updates_pkey" PRIMARY KEY, btree (id)
"entity_goal_updates_goal_id_fkey" btree (goal_id)
Foreign-key constraints:
"updates_goal_id_fkey" FOREIGN KEY (goal_id) REFERENCES goals(id) ON DELETE CASCADE
Schema | Name | Internal name | Size | Elements | Access privileges | Description
--------+-------------------+---------------+------+----------+-------------------+-------------
entity | entity.goalstatus | goalstatus | 4 | green +| |
| | | | amber +| |
| | | | red | |
答案 0 :(得分:6)
您每周需要一个数据项目和目标(在汇总每家公司的计数之前)。这是CROSS JOIN
和generate_series()
之间的简单goals
。 (可能)昂贵的部分是从每个state
获取当前updates
。与@Paul already suggested一样,LATERAL
加入似乎是最好的工具。但是,仅对updates
执行此操作,并使用LIMIT 1
更快的技术。
使用date_trunc()
简化日期处理。
SELECT w_start
, g.company_id
, count(*) FILTER (WHERE u.status = 'green') AS green_count
, count(*) FILTER (WHERE u.status = 'amber') AS amber_count
, count(*) FILTER (WHERE u.status = 'red') AS red_count
FROM generate_series(date_trunc('week', NOW() - interval '2 months')
, date_trunc('week', NOW())
, interval '1 week') w_start
CROSS JOIN goals g
LEFT JOIN LATERAL (
SELECT status
FROM updates
WHERE goal_id = g.id
AND created_at < w_start
ORDER BY created_at DESC
LIMIT 1
) u ON true
GROUP BY w_start, g.company_id
ORDER BY w_start, g.company_id;
要使 快速 ,您需要多列索引:
CREATE INDEX updates_special_idx ON updates (goal_id, created_at DESC, status);
created_at
的降序最好,但并非绝对必要。 Postgres几乎可以快速地向后扫描索引。 (Not applicable for inverted sort order of multiple columns, though.)
订单中的索引列。为什么呢?
仅添加第三列status
以允许updates
上的快速index-only scans。相关案例:
w_start
代表每周的开始。因此,计数是在一周的开始。如果你坚持,你可以仍然提取年份和周(或任何其他细节代表你的一周):
EXTRACT(isoyear from w_start) AS year
, EXTRACT(week from w_start) AS week
最好用ISOYEAR
,就像@Paul解释的那样。
相关:
答案 1 :(得分:3)
这似乎是LATERAL
联接的好用:
SELECT EXTRACT(ISOYEAR FROM s) AS year,
EXTRACT(WEEK FROM s) AS week,
u.company_id,
COUNT(u.goal_id) FILTER (WHERE u.status = 'green') AS green_count,
COUNT(u.goal_id) FILTER (WHERE u.status = 'amber') AS amber_count,
COUNT(u.goal_id) FILTER (WHERE u.status = 'red') AS red_count
FROM generate_series(NOW() - INTERVAL '2 months', NOW(), '1 week') s(w)
LEFT OUTER JOIN LATERAL (
SELECT DISTINCT ON (g.company_id, u2.goal_id) g.company_id, u2.goal_id, u2.status
FROM updates u2
INNER JOIN goals g
ON g.id = u2.goal_id
WHERE u2.created_at <= s.w
ORDER BY g.company_id, u2.goal_id, u2.created_at DESC
) u
ON true
WHERE u.company_id IS NOT NULL
GROUP BY year, week, u.company_id
ORDER BY u.company_id, year, week
;
顺便说一句,我正在提取ISOYEAR
而非YEAR
,以确保我在1月初获得明智的结果。例如,EXTRACT(YEAR FROM '2016-01-01 08:49:56.734556-08')
为2016
,EXTRACT(WEEK FROM '2016-01-01 08:49:56.734556-08')
为53
!
编辑:您应该测试一下您的实际数据,但我觉得这应该更快:
SELECT year,
week,
company_id,
COUNT(goal_id) FILTER (WHERE last_status = 'green') AS green_count,
COUNT(goal_id) FILTER (WHERE last_status = 'amber') AS amber_count,
COUNT(goal_id) FILTER (WHERE last_status = 'red') AS red_count
FROM (
SELECT EXTRACT(ISOYEAR FROM s) AS year,
EXTRACT(WEEK FROM s) AS week,
u.company_id,
u.goal_id,
(array_agg(u.status ORDER BY u.created_at DESC))[1] AS last_status
FROM generate_series(NOW() - INTERVAL '2 months', NOW(), '1 week') s(t)
LEFT OUTER JOIN (
SELECT g.company_id, u2.goal_id, u2.created_at, u2.status
FROM updates u2
INNER JOIN goals g
ON g.id = u2.goal_id
) u
ON s.t >= u.created_at
WHERE u.company_id IS NOT NULL
GROUP BY year, week, u.company_id, u.goal_id
) x
GROUP BY year, week, company_id
ORDER BY company_id, year, week
;
但仍然没有窗口功能。 :-)另外,通过用(array_agg(...))[1]
函数替换first
,你可以加快速度。您必须自己定义,但Postgres维基上的实现很容易谷歌。
答案 2 :(得分:0)
我使用PostgreSQL 9.3。我对你的问题很感兴趣。我检查了你的数据结构。比我创建下表。
我插入以下记录;
公司
目标
更新
之后我编写了以下查询,进行更正
SELECT c.id company_id, c.name company_name, u.status goal_status,
EXTRACT(week from u.created_at) goal_status_week,
EXTRACT(year from u.created_at) AS goal_status_year
FROM company c
INNER JOIN goals g ON g.company_id = c.id
INNER JOIN updates u ON u.goal_id = g.id
ORDER BY goal_status_year DESC, goal_status_week DESC;
最后,我将此查询与周系列
合并SELECT
gs.company_id,
gs.company_name,
gs.goal_status,
EXTRACT(year from w) AS year,
EXTRACT(week from w) AS week,
COUNT(gs.*) cnt
FROM generate_series(NOW() - INTERVAL '3 MONTHS', NOW(), '1 week') w
LEFT JOIN(
SELECT c.id company_id, c.name company_name, u.status goal_status,
EXTRACT(week from u.created_at) goal_status_week,
EXTRACT(year from u.created_at) AS goal_status_year
FROM company c
INNER JOIN goals g ON g.company_id = c.id
INNER JOIN updates u ON u.goal_id = g.id ) gs
ON gs.goal_status_week = EXTRACT(week from w) AND gs.goal_status_year = EXTRACT(year from w)
GROUP BY company_id, company_name, goal_status, year, week
ORDER BY year DESC, week DESC;
我得到了这个结果
祝你有个美好的一天。