在PostgreSQL上查询指标模式

时间:2016-02-24 08:41:41

标签: sql postgresql metrics postgresql-9.4

我有以下架构代表一个简单的指标存储:

CREATE TABLE targets (
    target varchar
);

CREATE TABLE reads (
    at timestamp without time zone,
    target varchar
);

CREATE TABLE updates (
    at timestamp without time zone,
    target varchar
);

关系readsupdates会在特定时间存储特定目标上的事件。

这些是相同的样本数据:

COPY targets (target) FROM stdin;
A
B
C
\.

COPY reads (at, target) FROM stdin;
1970-01-01 03:40:00 A
1970-01-01 06:00:00 B
1970-01-01 05:00:00 A
1970-01-03 05:00:00 A
1970-01-04 01:00:00 B
\.

COPY updates (at, target) FROM stdin;
1970-01-01 01:00:00 A
1970-01-01 01:00:00 B
1970-01-01 02:00:00 A
1970-01-01 04:00:00 A
1970-01-02 01:00:00 A
1970-01-02 01:00:00 B
1970-01-04 01:00:00 B
\.

我会得到一份报告,其中包含计算每个目标的按日期排列的所有指标,类似于以下查询(最终也没有"零"行),但是以更有效的方式:

select t.target, day::date,
    coalesce((select count(*) from updates where target = t.target and at::date = day), 0) updates,
    coalesce((select count(*) from reads   where target = t.target and at::date = day), 0) reads
from 
    generate_series('1970-01-01'::date, '1970-01-04'::date, '1 day'::interval) day,
    targets t
order by target, day;

 target |    day     | updates | reads 
--------+------------+---------+-------
 A      | 1970-01-01 |       3 |     2
 A      | 1970-01-02 |       1 |     0
 A      | 1970-01-03 |       0 |     1
 A      | 1970-01-04 |       0 |     0
 B      | 1970-01-01 |       1 |     1
 B      | 1970-01-02 |       1 |     0
 B      | 1970-01-03 |       0 |     0
 B      | 1970-01-04 |       1 |     1
 C      | 1970-01-01 |       0 |     0
 C      | 1970-01-02 |       0 |     0
 C      | 1970-01-03 |       0 |     0
 C      | 1970-01-04 |       0 |     0

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

您可以使用FULL JOIN对进行计数的子查询进行解决:

SELECT target, day, updates, reads
FROM (
    SELECT target, at::date AS day, count(*) AS updates FROM updates GROUP BY 1, 2
  ) num_updates
FULL JOIN (
    SELECT target, at::date AS day, count(*) AS reads FROM reads GROUP BY 1, 2
  ) num_reads USING (target, day)
WHERE day BETWEEN '1970-01-01'::date AND '1970-01-04'::date
ORDER BY 1, 2;

这不会为updatesreads以及NULL而不是0生成任何包含0值的行:

 target |    day     | updates | reads 
--------+------------+---------+-------
 A      | 1970-01-01 |       3 |     2
 A      | 1970-01-02 |       1 |     
 A      | 1970-01-03 |         |     1
 B      | 1970-01-01 |       1 |     1
 B      | 1970-01-02 |       1 |      
 B      | 1970-01-04 |       1 |     1

如果您确实需要0但不想同时包含updates = 0 AND reads = 0的行,请在选择列表的两个列上执行简单的coalesce()

SELECT target, day, coalesce(updates, 0) AS updates, coalesce(reads, 0) AS reads
...

如果你想要加倍NULL0,那么你应该generate_series()日期范围JOIN targets不合格完整的笛卡尔积,然后LEFT JOIN子查询:

SELECT target, day, updates, reads
FROM generate_series('1970-01-01'::date, '1970-01-04'::date, interval '1 day') d(day)
JOIN targets
LEFT JOIN (
    SELECT target, at::date AS day, count(*) AS updates FROM updates GROUP BY 1, 2
  ) num_updates USING (target, day)
LEFT JOIN (
    SELECT target, at::date AS day, count(*) AS reads FROM reads GROUP BY 1, 2
  ) num_reads USING (target, day)
ORDER BY 1, 2;