使用“SELECT DISTINCT”时如何保留其他键?

时间:2014-11-15 21:59:25

标签: postgresql distinct greatest-n-per-group postgresql-8.4

我希望在我的查询中使用sid时保留链接我的表的cidSELECT DISTINCT对。 signatureip_srcip_dst使其与众不同。我只希望输出还包括相应的sidcid对。

QUERY:

SELECT DISTINCT signature, ip_src, ip_dst FROM
     (SELECT *
          FROM event
          INNER  JOIN sensor ON (sensor.sid = event.sid)
          INNER  JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
          WHERE timestamp >= NOW() - '1 day'::INTERVAL
          ORDER BY timestamp DESC)
as d_dup;

输出:

 signature |   ip_src   |   ip_dst   
-----------+------------+------------
     29177 | 3244829114 | 2887777034
     29177 | 2960340989 | 2887777034
     29179 | 2887777893 | 2887777556
     29178 | 1208608738 | 2887777034
     29178 | 1211607091 | 2887777034
     29177 |  776526845 | 2887777034
     29177 | 1332731268 | 2887777034
(7 rows)

SUB QUERY:

SELECT *
          FROM event
          INNER  JOIN sensor ON (sensor.sid = event.sid)
          INNER  JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
          WHERE timestamp >= NOW() - '1 day'::INTERVAL
          ORDER BY timestamp DESC;

输出:

 sid |  cid  | signature |        timestamp        | sid |      hostname       | interface | filter | detail | encoding | last_cid | sid |  cid  |   ip_src   |   ip_dst   | ip_ver | ip_hlen | ip_tos | ip_len | ip_id | ip_flags | ip_off | ip_ttl | ip_proto | ip_csum 
-----+-------+-----------+-------------------------+-----+---------------------+-----------+--------+--------+----------+----------+-----+-------+------------+------------+--------+---------+--------+--------+-------+----------+--------+--------+----------+---------
   3 | 13123 |     29177 | 2014-11-15 20:53:14.656 |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13123 | 3244829114 | 2887777034 |      4 |       5 |      0 |    344 | 19301 |        0 |      0 |    122 |        6 |    8686
   3 | 13122 |     29177 | 2014-11-15 20:53:14.43  |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13122 | 3244829114 | 2887777034 |      4 |       5 |      0 |     69 | 19071 |        0 |      0 |    122 |        6 |    9191
   3 | 13121 |     29177 | 2014-11-15 18:45:13.461 |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13121 | 3244829114 | 2887777034 |      4 |       5 |      0 |    366 | 25850 |        0 |      0 |    122 |        6 |    2115
   3 | 13120 |     29177 | 2014-11-15 18:45:13.23  |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13120 | 3244829114 | 2887777034 |      4 |       5 |      0 |     69 | 25612 |        0 |      0 |    122 |        6 |    2650
   3 | 13119 |     29177 | 2014-11-15 18:45:01.887 |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13119 | 3244829114 | 2887777034 |      4 |       5 |      0 |    352 | 13697 |        0 |      0 |    122 |        6 |   14282
   3 | 13118 |     29177 | 2014-11-15 18:45:01.681 |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13118 | 3244829114 | 2887777034 |      4 |       5 |      0 |     69 | 13464 |        0 |      0 |    122 |        6 |   14798
   4 |    51 |     29179 | 2014-11-15 18:44:02.06  |   4 | VS-101-Z1:dna2:dna3 | dna2:dna3 |        |      1 |        0 |       51 |   4 |    51 | 2887777893 | 2887777556 |      4 |       5 |      0 |     80 | 18830 |        0 |      0 |     63 |       17 |   40533
   3 | 13117 |     29177 | 2014-11-15 18:41:46.418 |   3 | VS-101-Z0:dna0:dna1 | dna0:dna1 |        |      1 |        0 |    12888 |   3 | 13117 | 1332731268 | 2887777034 |      4 |       5 |      0 |    261 | 15393 |        0 |      0 |    119 |        6 |   62131
...
(30 rows)

使用sid时如何保留cidSELECT DISTINCT

3 个答案:

答案 0 :(得分:2)

这个更短,可能更快:

SELECT DISTINCT ON (signature, ip_src, ip_dst)
       signature, ip_src, ip_dst, sid, cid
FROM   event  e
JOIN   sensor s USING (sid)
JOIN   iphdr  i USING (cid, sid)
WHERE  timestamp >= NOW() - '1 day'::interval
ORDER  BY signature, ip_src, ip_dst, timestamp DESC;

假设您想从每组欺骗中获得最新行(最大timestamp)。
详细解释:

答案 1 :(得分:1)

听起来你正在寻找一个窗口功能:

SELECT *
FROM (
  SELECT *,
         row_number() over (partition by signature, ip_src, ip_dst order by timestamp desc) as rn
  FROM event
     JOIN sensor ON sensor.sid = event.sid
     JOIN iphdr ON iphdr.cid = event.cid AND iphdr.sid = event.sid
  WHERE timestamp >= NOW() - interval '1' day
) as d_dup
where rn = 1
order by timestamp desc;

答案 2 :(得分:0)

也许是这样的?

SELECT DISTINCT e.sid, e.cid, ip_src, ip_dst
FROM event e
INNER  JOIN sensor s ON (s.sid = e.sid)
INNER  JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL;

如果您希望(signature, ip_src, ip_dst)的组合在结果中是唯一的(每个组合一行),那么您可以尝试这样的事情:

SELECT max(e.cid), max(e.sid), signature, ip_src, ip_dst
FROM event e
INNER  JOIN sensor s ON (s.sid = e.sid)
INNER  JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
GROUP BY signature, ip_src, ip_dst;

但它会为每个组合提供最高cidsid