我正在尝试从备份设备(Avamar)上的ProgreSQL数据库表中收集有关备份活动的详细信息。该表有几列,包括:client_name,dataset,plugin_name,type,completed_ts,status_code,bytes_modified等。简化示例:
| session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified |
|------------|-------------|---------|---------------------|------------------|----------------------|-------------|----------------|
| 1 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-05T01:00:00Z | 30900 | 11111111 |
| 2 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-04T01:00:00Z | 30000 | 22222222 |
| 3 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-03T01:00:00Z | 30000 | 22222222 |
| 4 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-02T01:00:00Z | 30000 | 22222222 |
| 5 | server01 | Windows | Windows VSS | Scheduled Backup | 2017-12-01T01:00:00Z | 30000 | 33333333 |
| 6 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-05T02:00:00Z | 30000 | 44444444 |
| 7 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-04T02:00:00Z | 30900 | 55555555 |
| 8 | server03 | Windows | Windows File System | On-Demand Backup | 2017-12-05T03:00:00Z | 30000 | 66666666 |
| 9 | server04 | Windows | Windows File System | Validate | 2017-12-05T03:00:00Z | 30000 | 66666666 |
每个client_name(服务器)可以有多个数据集,每个数据集可以有多个plugin_names。所以我创建了一个SQL语句,它使用这三列的GROUP BY来获取一段时间内“作业”活动的列表。 (http://sqlfiddle.com/#!15/f15556/1)
select
client_name,
dataset,
plugin_name
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
这些作业中的每一个都可以基于status_code列成功或失败。使用子查询的自联接,我能够得到Last Good备份的结果以及它的completed_ts(完成时间)和bytes_modified等等: (http://sqlfiddle.com/#!15/f15556/16)
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
我可以单独做同样的事情,通过删除WHERE的status_code行来获取Last Attempt细节:http://sqlfiddle.com/#!15/f15556/3。请注意,LastGood和LastAttempted大多数时间都是同一行,但有时它们不是,这取决于上次备份是否成功。
我遇到的问题是将这两个语句合并在一起(如果可能的话)。所以我会得到这个结果:
| client_name | dataset | plugin_name | lastgood | lastgood_bytes | lastattempt | lastattempt_bytes |
|-------------|---------|---------------------|----------------------|-----------------|----------------------|-------------------|
| server01 | Windows | Windows File System | 2017-12-04T01:00:00Z | 22222222 | 2017-12-05T01:00:00Z | 11111111 |
| server01 | Windows | Windows VSS | 2017-12-01T01:00:00Z | 33333333 | 2017-12-01T01:00:00Z | 33333333 |
| server02 | Windows | Windows File System | 2017-12-05T02:00:00Z | 44444444 | 2017-12-05T02:00:00Z | 44444444 |
| server03 | Windows | Windows File System | 2017-12-05T03:00:00Z | 66666666 | 2017-12-05T03:00:00Z | 66666666 |
我尝试在末尾添加另一个RIGHT JOIN(http://sqlfiddle.com/#!15/f15556/4)并获取NULL行。在做了一些阅读之后,我看到前两个JOIN首先在第二次连接发生之前创建了一个临时表,但是那时我需要的数据丢失了所以我得到了NULL行。
通过groovy脚本使用PostgreSQL 8。我也只有DB的只读权限。
答案 0 :(得分:1)
您显然有两个中间inner join
输出表,并且您希望从每个输出表中获取有关由公共密钥标识的某些内容的列。所以inner join
就是关键。
select
g.client_name,
g.dataset,
g.plugin_name,
LastGood,
g.status_code,
LastGood_bytes
LastAttempt,
l.status_code,
LastAttempt_bytes
from
( -- cut & pasted Last Good http://sqlfiddle.com/#!15/f15556/16
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
) as g
join
( -- cut & pasted Last Attempt http://sqlfiddle.com/#!15/f15556/3
select
a1.client_name,
a1.dataset,
a1.plugin_name,
a1.LastAttempt,
a3.status_code,
a3.bytes_modified as LastAttempt_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2 a2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as a1
on a3.client_name = a1.client_name and
a3.dataset = a1.dataset and
a3.plugin_name = a1.plugin_name and
a3.completed_ts = a1.LastAttempt
) as l
on l.client_name = g.client_name and
l.dataset = g.dataset and
l.plugin_name = g.plugin_name
order by client_name, dataset, plugin_name
这使用了Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs中的一种适用方法。然而,代码块的对应可能不那么清楚。它的中间人是left
与inner
& group_concat
是max
。 (但由于group_concat
及其查询的详细信息,它有更多的方法。)
正确的对称INNER JOIN方法:LEFT JOIN q1& q2--1:很多 - 然后GROUP BY& GROUP_CONCAT(这是你的第一个查询所做的);然后分别类似LEFT JOIN q1& q3--1:很多 - 然后GROUP BY& GROUP_CONCAT;然后INNER JOIN两个结果ON user_id - 1:1。
正确的累积LEFT JOIN方法:JOIN q1& q2--1:很多 - 然后GROUP BY& GROUP_CONCAT;然后离开加入& q3--1:很多 - 然后GROUP BY& GROUP_CONCAT。
这实际上是否实际上符合您的目的取决于您的实际规格和限制。即使您链接的两个join
是您想要的,您也需要通过“合并”来准确解释您的意思。如果join
具有不同的分组列值,则不会说出您想要的内容。强制自己使用英语根据输入中的行来说明结果中的行。
PS 1您有未记录/未声明/未强制的约束。请尽可能申报。否则通过触发器强制执行。如果没有代码,请在问题文本中记录。约束是join
&中多个子行值实例的基础。到group by
。
PS 2学习select
的语法/语义。了解left
/ right
outer join on
返回的内容 - inner join on
执行的内容加上null
扩展的不匹配的左/右表格行。
PS 3 Is there any rule of thumb to construct SQL query from a human-readable description?
答案 1 :(得分:0)
以下是一种替代方法,也可以使用,但更难以遵循,可能更具体到我的数据集:http://sqlfiddle.com/#!15/f15556/114
select
Actvty.client_name,
Actvty.dataset,
Actvty.plugin_name,
ActvtyGood.LastGood,
ActvtyGood.status_code as LastGood_status,
ActvtyGood.bytes_modified as LastGood_bytes,
ActvtyOnly.LastAttempt,
Actvty.status_code as LastAttempt_status,
Actvty.bytes_modified as LastAttempt_bytes
from v_activities_2 Actvty
-- 1. Get last attempt of each job (which may or may not match last good)
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as ActvtyOnly
on Actvty.client_name = ActvtyOnly.client_name and
Actvty.dataset = ActvtyOnly.dataset and
Actvty.plugin_name = ActvtyOnly.plugin_name and
Actvty.completed_ts = ActvtyOnly.LastAttempt
-- 4. join the list of good runs with the table of last attempts, there would never be a job that has a last good without also a last attempt.
join (
-- 3. join last good runs with the full table to get the additional details of each
select
ActvtyGoodSub.client_name,
ActvtyGoodSub.dataset,
ActvtyGoodSub.plugin_name,
ActvtyGoodSub.LastGood,
ActvtyAll.status_code,
ActvtyAll.bytes_modified
from v_activities_2 ActvtyAll
-- 2. Get last Good run of each job
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as ActvtyGoodSub
on ActvtyAll.client_name = ActvtyGoodSub.client_name and
ActvtyAll.dataset = ActvtyGoodSub.dataset and
ActvtyAll.plugin_name = ActvtyGoodSub.plugin_name and
ActvtyAll.completed_ts = ActvtyGoodSub.LastGood
) as ActvtyGood
on Actvty.client_name = ActvtyGood.client_name and
Actvty.dataset = ActvtyGood.dataset and
Actvty.plugin_name = ActvtyGood.plugin_name