基于GROUP BY结果的多个自联接

时间:2017-12-11 17:46:32

标签: sql postgresql group-by subquery self-join

我正在尝试从备份设备(Avamar)上的ProgreSQL数据库表中收集有关备份活动的详细信息。该表有几列,包括:client_name,dataset,plugin_name,type,completed_ts,status_code,bytes_modified等。简化示例:

| session_id | client_name | dataset |         plugin_name |             type |         completed_ts | status_code | bytes_modified |
|------------|-------------|---------|---------------------|------------------|----------------------|-------------|----------------|
|          1 |    server01 | Windows | Windows File System | Scheduled Backup | 2017-12-05T01:00:00Z |       30900 |       11111111 |
|          2 |    server01 | Windows | Windows File System | Scheduled Backup | 2017-12-04T01:00:00Z |       30000 |       22222222 |
|          3 |    server01 | Windows | Windows File System | Scheduled Backup | 2017-12-03T01:00:00Z |       30000 |       22222222 |
|          4 |    server01 | Windows | Windows File System | Scheduled Backup | 2017-12-02T01:00:00Z |       30000 |       22222222 |
|          5 |    server01 | Windows |         Windows VSS | Scheduled Backup | 2017-12-01T01:00:00Z |       30000 |       33333333 |
|          6 |    server02 | Windows | Windows File System | Scheduled Backup | 2017-12-05T02:00:00Z |       30000 |       44444444 |
|          7 |    server02 | Windows | Windows File System | Scheduled Backup | 2017-12-04T02:00:00Z |       30900 |       55555555 |
|          8 |    server03 | Windows | Windows File System | On-Demand Backup | 2017-12-05T03:00:00Z |       30000 |       66666666 |
|          9 |    server04 | Windows | Windows File System |         Validate | 2017-12-05T03:00:00Z |       30000 |       66666666 |

每个client_name(服务器)可以有多个数据集,每个数据集可以有多个plugin_names。所以我创建了一个SQL语句,它使用这三列的GROUP BY来获取一段时间内“作业”活动的列表。 (http://sqlfiddle.com/#!15/f15556/1

select
  client_name,
  dataset,
  plugin_name
from v_activities_2
where
  type like '%Backup%'
group by
  client_name, dataset, plugin_name

这些作业中的每一个都可以基于status_code列成功或失败。使用子查询的自联接,我能够得到Last Good备份的结果以及它的completed_ts(完成时间)和bytes_modified等等: (http://sqlfiddle.com/#!15/f15556/16

select
  a2.client_name,
  a2.dataset,
  a2.plugin_name,
  a2.LastGood,
  a3.status_code,
  a3.bytes_modified as LastGood_bytes
from v_activities_2 a3

join (
  select
    client_name,
    dataset,
    plugin_name,
    max(completed_ts) as LastGood
  from v_activities_2 a2
  where
    type like '%Backup%'
    and status_code in (30000,30005)   -- Successful (Good) Status codes
  group by
    client_name, dataset, plugin_name
) as a2
on a3.client_name  = a2.client_name and
   a3.dataset      = a2.dataset and
   a3.plugin_name  = a2.plugin_name and
   a3.completed_ts = a2.LastGood

我可以单独做同样的事情,通过删除WHERE的status_code行来获取Last Attempt细节:http://sqlfiddle.com/#!15/f15556/3。请注意,LastGood和LastAttempted大多数时间都是同一行,但有时它们不是,这取决于上次备份是否成功。

我遇到的问题是将这两个语句合并在一起(如果可能的话)。所以我会得到这个结果:

| client_name | dataset |         plugin_name |             lastgood |  lastgood_bytes |          lastattempt | lastattempt_bytes |
|-------------|---------|---------------------|----------------------|-----------------|----------------------|-------------------|
|    server01 | Windows | Windows File System | 2017-12-04T01:00:00Z |        22222222 | 2017-12-05T01:00:00Z |          11111111 |
|    server01 | Windows |         Windows VSS | 2017-12-01T01:00:00Z |        33333333 | 2017-12-01T01:00:00Z |          33333333 |
|    server02 | Windows | Windows File System | 2017-12-05T02:00:00Z |        44444444 | 2017-12-05T02:00:00Z |          44444444 |
|    server03 | Windows | Windows File System | 2017-12-05T03:00:00Z |        66666666 | 2017-12-05T03:00:00Z |          66666666 |

我尝试在末尾添加另一个RIGHT JOIN(http://sqlfiddle.com/#!15/f15556/4)并获取NULL行。在做了一些阅读之后,我看到前两个JOIN首先在第二次连接发生之前创建了一个临时表,但是那时我需要的数据丢失了所以我得到了NULL行。

通过groovy脚本使用PostgreSQL 8。我也只有DB的只读权限。

2 个答案:

答案 0 :(得分:1)

您显然有两个中间inner join输出表,并且您希望从每个输出表中获取有关由公共密钥标识的某些内容的列。所以inner join就是关键。

select
  g.client_name,
  g.dataset,
  g.plugin_name,
  LastGood,
  g.status_code,
  LastGood_bytes
  LastAttempt,
  l.status_code,
  LastAttempt_bytes
from
( -- cut & pasted Last Good http://sqlfiddle.com/#!15/f15556/16
    select
      a2.client_name,
      a2.dataset,
      a2.plugin_name,
      a2.LastGood,
      a3.status_code,
      a3.bytes_modified as LastGood_bytes
    from v_activities_2 a3
    join (
      select
        client_name,
        dataset,
        plugin_name,
        max(completed_ts) as LastGood
      from v_activities_2 a2
      where
        type like '%Backup%'
        and status_code in (30000,30005)   -- Successful (Good) Status codes
      group by
        client_name, dataset, plugin_name
    ) as a2
    on a3.client_name  = a2.client_name and
       a3.dataset      = a2.dataset and
       a3.plugin_name  = a2.plugin_name and
       a3.completed_ts = a2.LastGood
) as g
join 
( -- cut & pasted Last Attempt http://sqlfiddle.com/#!15/f15556/3
    select
      a1.client_name,
      a1.dataset,
      a1.plugin_name,
      a1.LastAttempt,
      a3.status_code,
      a3.bytes_modified as LastAttempt_bytes
    from v_activities_2 a3
    join (
      select
        client_name,
        dataset,
        plugin_name,
        max(completed_ts) as LastAttempt
      from v_activities_2 a2
      where
        type like '%Backup%'
      group by
        client_name, dataset, plugin_name
    ) as a1
    on a3.client_name  = a1.client_name and
       a3.dataset      = a1.dataset and
       a3.plugin_name  = a1.plugin_name and
       a3.completed_ts = a1.LastAttempt
) as l
on l.client_name  = g.client_name and
   l.dataset      = g.dataset and
   l.plugin_name  = g.plugin_name
order by client_name, dataset, plugin_name

这使用了Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs中的一种适用方法。然而,代码块的对应可能不那么清楚。它的中间人是leftinner& group_concatmax。 (但由于group_concat及其查询的详细信息,它有更多的方法。)

  

正确的对称INNER JOIN方法:LEFT JOIN q1& q2--1:很多 - 然后GROUP BY& GROUP_CONCAT(这是你的第一个查询所做的);然后分别类似LEFT JOIN q1& q3--1:很多 - 然后GROUP BY& GROUP_CONCAT;然后INNER JOIN两个结果ON user_id - 1:1。

  

正确的累积LEFT JOIN方法:JOIN q1& q2--1:很多 - 然后GROUP BY& GROUP_CONCAT;然后离开加入& q3--1:很多 - 然后GROUP BY& GROUP_CONCAT。

这实际上是否实际上符合您的目的取决于您的实际规格和限制。即使您链接的两个join是您想要的,您也需要通过“合并”来准确解释您的意思。如果join具有不同的分组列值,则不会说出您想要的内容。强制自己使用英语根据输入中的行来说明结果中的行。

PS 1您有未记录/未声明/未强制的约束。请尽可能申报。否则通过触发器强制执行。如果没有代码,请在问题文本中记录。约束是join&中多个子行值实例的基础。到group by

PS 2学习select的语法/语义。了解left / right outer join on返回的内容 - inner join on执行的内容加上null扩展的不匹配的左/右表格行。

PS 3 Is there any rule of thumb to construct SQL query from a human-readable description?

答案 1 :(得分:0)

以下是一种替代方法,也可以使用,但更难以遵循,可能更具体到我的数据集:http://sqlfiddle.com/#!15/f15556/114

select
  Actvty.client_name,
  Actvty.dataset,
  Actvty.plugin_name,
  ActvtyGood.LastGood,
  ActvtyGood.status_code as LastGood_status,
  ActvtyGood.bytes_modified as LastGood_bytes,
  ActvtyOnly.LastAttempt,
  Actvty.status_code as LastAttempt_status,
  Actvty.bytes_modified as LastAttempt_bytes
from v_activities_2 Actvty

-- 1. Get last attempt of each job (which may or may not match last good)
join (
  select
    client_name,
    dataset,
    plugin_name,
    max(completed_ts) as LastAttempt
  from v_activities_2
  where
    type like '%Backup%'
  group by
    client_name, dataset, plugin_name
) as ActvtyOnly
on Actvty.client_name  = ActvtyOnly.client_name and
   Actvty.dataset      = ActvtyOnly.dataset and
   Actvty.plugin_name  = ActvtyOnly.plugin_name and
   Actvty.completed_ts = ActvtyOnly.LastAttempt

-- 4. join the list of good runs with the table of last attempts, there would never be a job that has a last good without also a last attempt.
join (

  -- 3. join last good runs with the full table to get the additional details of each
  select
    ActvtyGoodSub.client_name,
    ActvtyGoodSub.dataset,
    ActvtyGoodSub.plugin_name,
    ActvtyGoodSub.LastGood,
    ActvtyAll.status_code,
    ActvtyAll.bytes_modified
  from v_activities_2 ActvtyAll

  -- 2. Get last Good run of each job
  join (
    select
      client_name,
      dataset,
      plugin_name,
      max(completed_ts) as LastGood
    from v_activities_2
    where
      type like '%Backup%'
      and status_code in (30000,30005)   -- Successful (Good) Status codes
    group by
      client_name, dataset, plugin_name
  ) as ActvtyGoodSub
  on ActvtyAll.client_name  = ActvtyGoodSub.client_name and
     ActvtyAll.dataset      = ActvtyGoodSub.dataset and
     ActvtyAll.plugin_name  = ActvtyGoodSub.plugin_name and
     ActvtyAll.completed_ts = ActvtyGoodSub.LastGood

) as ActvtyGood
on Actvty.client_name  = ActvtyGood.client_name and
   Actvty.dataset      = ActvtyGood.dataset and
   Actvty.plugin_name  = ActvtyGood.plugin_name