在psql中为每个组选择前5个得分结果

时间:2015-07-24 09:54:58

标签: sql postgresql greatest-n-per-group

我需要一些帮助,帮助我们选择前5名"每个裁判员最关键的服务器"。

目前我有一个select语句,它返回所有机器的Hostname,SystemID,Customer和Critical值。我对此查询的下一步要做的是,只为每个客户选择前5个最重要的(严重得分最高)。

我当前的select语句如下所示:

SELECT COALESCE(rhss_py_results.Hostname, sid_py_results.Name) AS Hostname, COALESCE(rhss_py_results.SystemID, security_scores.SystemID) AS SystemID, Customer, Critical
FROM rhss_py_results
INNER JOIN sid_py_results
ON rhss_py_results.hostname = sid_py_results.Name
INNER JOIN Customers
ON sid_py_results.SecurityDomain = Customers.SecurityDomain
INNER JOIN security_scores
ON rhss_py_results.SystemID=security_scores.SystemID
ORDER BY Customer;

它会返回以下内容:(由于隐私而导致数据发生变化)

     hostname      |  systemid  |         customer         | critical
-------------------+------------+--------------------------+----------
 aaa-aaaa_aaaa     | 1000000024 | Anna                     |       48
 aaa-aaa3-aaa1     | 1000000038 | Anna                     |        5
 aaaaaa001         | 1000000013 | Kalle                    |       10
 aaaaaa002         | 1000000043 | Kalle                    |        1
 aaaaaa005         | 1000000087 | Pelle                    |        5
 bbbbbb0010        | 1000000003 | Pelle                    |        0
 cccccc0001        | 1000000029 | Sara                     |        0
 ddd-dddd-c001     | 1000000063 | Anna                     |       26
 ddd-dddd-c002     | 1000000064 | Anna                     |       24
 ddd-dddd-c003     | 1000000012 | Anna                     |        5
 fff-ffff-f001     | 1000000095 | Anna                     |       13
 gggggg0001        | 1000000077 | Sara                     |        0
 gggggg0002        | 1000000040 | Pelle                    |        0
 gggggg0003        | 1000000039 | Pelle                    |        1
 mmmmmm033         | 1000000047 | Kalle                    |       31
 mmmmmm034         | 1000000045 | Kalle                    |       37
 mmmmmm036         | 1000000046 | Pelle                    |        3
 mmmmmm037         | 1000000082 | Pelle                    |        3
 mmmmmm045         | 1000000091 | Håkan                    |        0

有些客户只有1台服务器,有些客户只有15台,如果客户只有1台服务器,那么仅列出该服务器就足够了。 如果客户的多个服务器具有与前5个相同的临界值,则根据主机名返回最相关的5个服务器是正常的。

我有超过32个不同的客户,这个数字将来会有所不同。

以下结果应该是最终产品(ish):

     hostname      |  systemid  |         customer         | critical
-------------------+------------+--------------------------+----------
 aaa-aaaa_aaaa     | 1000000024 | Anna                     |       48
 ddd-dddd-c001     | 1000000063 | Anna                     |       26
 ddd-dddd-c002     | 1000000064 | Anna                     |       24
 fff-ffff-f001     | 1000000095 | Anna                     |       13
 aaa-aaa3-aaa1     | 1000000038 | Anna                     |        5
 mmmmmm045         | 1000000091 | Håkan                    |        0
 mmmmmm034         | 1000000045 | Kalle                    |       37
 mmmmmm033         | 1000000047 | Kalle                    |       31
 aaaaaa001         | 1000000013 | Kalle                    |       10
 aaaaaa002         | 1000000043 | Kalle                    |        1
 aaaaaa005         | 1000000087 | Pelle                    |        5
 mmmmmm036         | 1000000046 | Pelle                    |        3
 mmmmmm037         | 1000000082 | Pelle                    |        3
 gggggg0003        | 1000000039 | Pelle                    |        1
 bbbbbb0010        | 1000000003 | Pelle                    |        0
 cccccc0001        | 1000000029 | Sara                     |        0
 gggggg0001        | 1000000077 | Sara                     |        0

我已阅读以下文章,但不了解如何将其应用于我自己的查询,因为我对数据库很新。 http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

有没有人可以帮我解决这个问题?

//安布罗斯

2 个答案:

答案 0 :(得分:3)

使用row_number函数

SELECT *
FROM
  (SELECT (row_number() over (partition BY Customer
                             ORDER BY COALESCE(rhss_py_results.SystemID, security_scores.SystemID) DESC)) AS sno,
                       COALESCE(rhss_py_results.Hostname, sid_py_results.Name) AS Hostname,
                       COALESCE(rhss_py_results.SystemID, security_scores.SystemID) AS SystemID,
                       Customer,
                       Critical
   FROM rhss_py_results
   INNER JOIN sid_py_results ON rhss_py_results.hostname = sid_py_results.Name
   INNER JOIN Customers ON sid_py_results.SecurityDomain = Customers.SecurityDomain
   INNER JOIN security_scores ON rhss_py_results.SystemID=security_scores.SystemID
   ORDER BY Customer) AS t
WHERE sno<=5;

答案 1 :(得分:0)

按关键设施使用订单

SELECT COALESCE(rhss_py_results.Hostname, sid_py_results.Name) AS Hostname, COALESCE(rhss_py_results.SystemID, security_scores.SystemID) AS SystemID, Customer, Critical
FROM rhss_py_results
INNER JOIN sid_py_results
ON rhss_py_results.hostname = sid_py_results.Name
INNER JOIN Customers
ON sid_py_results.SecurityDomain = Customers.SecurityDomain
INNER JOIN security_scores
ON rhss_py_results.SystemID=security_scores.SystemID
ORDER BY Customer,Critical desc;