SQL:每个值​​的max occurence(s)

时间:2016-09-13 10:36:56

标签: sql sqlite join limit-per-group

我有一个非常简单的表(LOG),包含MAC_ADDR,IP_SRC,IP_DST,URL,PROTOCOL属性。我希望第一个包含IP_SRC,URL,#OfOccurrences的 n 行,当PROTOCOL ='DNS'时,通过减少表中每个IP_SRC的#OfOccurrences来排序。

为了更清楚,我希望能够在我的表中为每个IP_SRC列出第一个 n 访问量最大的页面。

我可以为每个IP_SRC获取访问量最大的URL:

select ip_src,url,cnt
from (
    select ip_src,url,count(*) as cnt,protocol
    from log as b group by ip_src,url order by ip_src,cnt desc
) as c
where cnt>=(select MAX(cpt)
            from (select count(*) as cpt from log as b
            where c.ip_src==b.ip_src group by ip_src,url)
           )
      and protocol='DNS';

然而,这个解决方案显然没有得到优化。

这是一个更实用的代码(针对每个IP_SRC访问量最大的URL):

select ip_src,url,cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src,cnt asc)
group by ip_src;

第二种选择更快!但是,我想要每个IP_SRC的 n 访问量最大的页面,我无法弄清楚如何做。

感谢您的帮助。

3 个答案:

答案 0 :(得分:1)

使用common table expression

WITH Temp1 AS (
  SELECT ip_src, url, count(*) AS cnt
  FROM Log
  WHERE protocol = 'DNS'
  GROUP BY ip_src, url
)
SELECT ip_src, url, cnt
FROM Temp1 AS T1
WHERE url IN (
  SELECT url
  FROM Temp1 AS T2
  WHERE T2.ip_src = T1.ip_src
    AND T2.cnt >= T1.cnt
  ORDER BY cnt DESC
  LIMIT 3  -- or whatever you want it to be
)
ORDER BY ip_src ASC, cnt DESC;

答案 1 :(得分:0)

select x.ip_src, x.url, x.cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src, count(*) desc) AS x
group by x.ip_src;

你可以尝试一下吗?

答案 2 :(得分:0)

最后,通过使用临时表,我可以设法得到我想要的东西。

--First create a temp table of occurences
CREATE TEMPORARY TABLE TEMP1 AS
SELECT ip_src,url,count(*) AS cnt
FROM LOG
WHERE protocol='DNS'
GROUP BY ip_src,url
ORDER BY ip_src,cnt,url DESC;
--Then use a classic limit per group query
SELECT T1.ip_src,T1.url,T1.cnt
FROM TEMP1 AS T1
WHERE T1.url in (
      SELECT T2.url
      FROM TEMP1 AS T2
      WHERE T2.ip_src=T1.ip_src and T2.cnt>=T1.cnt
      ORDER BY T2.cnt DESC
      LIMIT 3 --Or whatever you want it to be
)
ORDER BY T1.ip_src ASC,T1.cnt DESC;

如果有人知道怎么做而不需要临时表(或者解释为什么临时表是一个好的解决方案),请表达自己。