如何在同一个查询但不同列中合并2个查询?

时间:2015-10-05 18:29:51

标签: sql postgresql join null window-functions

我使用PostgreSQL 9.3.9运行两个不同的查询,这些查询会产生不同的结果,但这两个查询都按照"月 - 年"进行分组。我想知道如何创建一个查询来在 one 表中为我提供相同的数据?

查询1:

SELECT CONCAT(EXTRACT(MONTH FROM startedPayingDate), '-', 
            EXTRACT(YEAR FROM startedPayingDate)) AS "Month", 
 COUNT(*) AS "Total AB Paying Customers"
FROM (       
 SELECT cm.customer_id, MIN(cm.created_at) AS startedPayingDate 
 FROM customerusermap AS cm, users as u
 WHERE cm.customer_id = u.customer_id AND cm.user_id<>u.id 
 GROUP BY cm.customer_id ) AS t
GROUP BY 1, EXTRACT(MONTH FROM startedPayingDate), EXTRACT(YEAR FROM startedPayingDate)
ORDER BY EXTRACT(YEAR FROM startedPayingDate), EXTRACT(MONTH FROM startedPayingDate);

结果如下:

Month  | Total AB Paying Customers
---------------------------------
3-2014 | 2
4-2014 | 4

查询2:

SELECT concat(extract(MONTH from u.created_at),'-',extract(year from u.created_at)) as "Month", 
count(u.email) as "Total SMB Paying Customers"
FROM customerusermap AS cm, users AS u 
WHERE cm.customer_id = u.customer_id AND cm.user_id = u.id AND u.paid_status = 'paying' 
GROUP by 1,extract(month from u.created_at),extract(year from u.created_at)
order by extract(year from u.created_at),extract(month from u.created_at);

结果如下:

Month  | Total SMB Paying Customers
-----------------------------------
2-2014 | 3
3-2014 | 8
4-2014 | 5

期望的结果

我想将这两个查询合并为如图所示的结果,并按年份和月份(即最旧到最新)进行排序:

Month  | Total AB Paying Customers | Total SMB Paying Customers | Total | Cumulative
-------------------------------------------------------------------------------------
2-2014 |           0               |            3               |    3  |   3
3-2014 |           2               |            8               |    10 |   13
4-2014 |           4               |            5               |    9  |   22

表定义

CREATE TABLE users (
id serial NOT NULL,
firstname character varying(255) NOT NULL,
lastname character varying(255) NOT NULL,
email character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
customer_id character varying(255) DEFAULT NULL::character varying,
companyname character varying(255),
primary_user_id integer,
paid_status character varying(255),  -- updated from comment
CONSTRAINT users_pkey PRIMARY KEY (id),
CONSTRAINT primary_user_id_fk FOREIGN KEY (primary_user_id) REFERENCES users (id),
CONSTRAINT users_uuid_key UNIQUE (uuid)
)

而customerusermap表如下所示:

CREATE TABLE customerusermap (
id serial NOT NULL,
user_id integer NOT NULL,
customer_id character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT customerusermap_pkey PRIMARY KEY (id),
CONSTRAINT customerusermap_user_id_fkey FOREIGN KEY (user_id) REFERENCES users (id),
CONSTRAINT customerusermap_user_id_key UNIQUE (user_id)
);

1 个答案:

答案 0 :(得分:1)

一般解决方案

关键功能是 FULL OUTER JOIN ,但正确处理NULL值:

SELECT *
     , "Total AB Paying Customers" + "Total SMB Paying Customers" AS "Total"
     , sum("Total AB Paying Customers" + "Total SMB Paying Customers")
         OVER (ORDER BY "Month") AS "Cumulative"
FROM  (
   SELECT "Month"
        , COALESCE(q1."Total AB Paying Customers", 0)  AS "Total AB Paying Customers"
        , COALESCE(q2."Total SMB Paying Customers", 0) AS "Total SMB Paying Customers"
   FROM      (<query1>) q1
   FULL JOIN (<query2>) q2 USING ("Month")
   ) sub;

使用sum()作为累计金额的window function 附加子查询图层仅为方便起见,因此我们不必经常添加COALESCE()。 查询可以进一步简化:格式化外部SELECT中的月份等

优化查询

根据您添加的设置:

SELECT to_char(mon, 'FMMM-YYYY') AS "Month"
     , ct_ab                     AS "Total AB Paying Customers"
     , ct_smb                    AS "Total SMB Paying Customers"
     , ct_ab + ct_smb            AS "Total"
     , sum(ct_ab + ct_smb) OVER (ORDER BY mon)::int AS "Cumulative"
FROM  (
   SELECT mon, COALESCE(q1.ct_ab, 0) AS ct_ab, COALESCE(q2.ct_smb, 0) AS ct_smb
   FROM  (
      SELECT date_trunc('month', start_date) AS mon, count(*)::int AS ct_ab
      FROM  (       
         SELECT cm.customer_id, min(cm.created_at) AS start_date 
         FROM   customerusermap cm
         JOIN   users u USING (customer_id)
         WHERE  cm.user_id <> u.id 
         GROUP  BY 1
         ) t
      GROUP  BY 1
      ) q1
   FULL JOIN (
      SELECT date_trunc('month', u.created_at) AS mon, count(*)::int AS ct_smb
      FROM   customerusermap cm
      JOIN   users u USING (customer_id)
      WHERE  cm.user_id = u.id AND u.paid_status = 'paying' 
      GROUP  BY 1
      ) q2 USING (mon)
   ) sub;
ORDER  BY mon;

重点

  • 使用to_char()以您喜欢的方式格式化您的月份。并且最后只需一次template pattern FMMM生成的月号不带前导零,就像您原来的一样。

  • 使用date_trunc()来确定您的timestamp without time zone到月份的分辨率(当月的第一个时间戳,但这没有区别)。

  • 我添加ORDER BY mon以获得您评论的排序顺序。由于专栏mon仍为timestamp(尚未转换为字符串(text),因此符合预期。

  • 由于u.email定义为NOT NULLcount(*)在此上下文中与count(u.email)相同,但便宜一点。

  • 使用显式JOIN语法。相同的表现,但更清晰。

  • 我将汇总计数转换为integer。这完全是可选(假设你没有整数溢出)。因此,您在结果中包含所有整数,而不是bigintnumeric

与原版相比,你会发现它更短,更快。

如果性能很重要,请确保在相关列上有索引。如果users中有多个条目到customerusermap中的一个条目,那么JOIN LATERAL有更复杂的选项可以让您的查询更快:

如果您想要将没有任何活动的月份包括在内,请将LEFT JOIN添加到完整的月份列表中。例如: