SQL联盟“所有其他”行

时间:2015-03-27 17:29:48

标签: sql sqlite union

我有一个Sqlite数据库,里面有近500,000行的访问日志信息。我正在使用它来获取聚合信息,例如“每个ip访问网站的次数”,或“命中百分比是POST”等。

我写了一个SQL查询,收集每个IP地址到达网站的次数,其中出现的次数大于IP地址计数的1%。

select ip_address, count(ip_address)
from records
group by ip_address
having count(ip_address) > (select count(ip_address) from records) * .01

这将返回大约7个重要的IP地址。我如何将“所有其他”行合并到结果集?

我尝试使用逻辑相反的UNIONing

select "All Others", count(ip_address)
from records
group by ip_address
having count(ip_address) < (select count(ip_address) from records) * .01

但这会返回多个“所有其他”行,并且计数是连续的。

3 个答案:

答案 0 :(得分:1)

当然要使用union all ..但这并没有回答问题&#34;。

这个问题是第二个查询&#34;返回多个&#34; (就像第一个查询一样)因为group by是IP,其中有很多。也就是说,每个组都有一个结果元组 ,与select输出子句中的任何操作无关。

期望的目标可能是将外部选择与计数相加。

-- union all
select "All Others", sum(t.ct)
from (
   select count(ip_address) as ct
   from records
   group by ip_address
   -- note: <=, and not <, is inverse of >
   having count(ip_address) <= (select count(ip_address) from records) * .01
   ) t

当然,如果&#39;总计&#39;和&#39;发现&#39;众所周知,其他人&#39;是&#39;总计&#39; - &#39;发现&#39;。

计数是连续的,而有趣的观察是无关紧要的。请记住,当没有order by应用于具体化结果集时,SQL可以以任何顺序返回行(在子选择中order by不是严格保证的。)

答案 1 :(得分:1)

您可以使用变量来保存此信息吗?

DECLARE @num INT
SET @num = (select count(*)
             from records
             group by ip_address
             having count(*) > (select count(ip_address) from records) * .01)

然后进行常规查询

select ip_address, count(ip_address)
from records
group by ip_address
having count(ip_address) > (select count(ip_address) from records) * .01
UNION
select "All Others", count(ip_address)-@num
from records      

答案 2 :(得分:0)

没有CTE,这可能是最好的(我不确定sqlite允许的是什么)。使用not in可以防止您必须编写与您的条件相反的情况,在其他情况下可能会因为空值或浮点数学考虑而更复杂:

select ip_address, count(ip_address)
from records
group by ip_address
having count(ip_address) > (select count(ip_address) from records) * .01
union all
select 'All others', count(*)
from records
where ip_address not in (
    select ip_address /* assuming non-null ip_address */
    from records
    group by ip_address
    having count(ip_address) > (select count(ip_address) from records) * .01
)

否则:

with topPercent as (
    select ip_address, count(ip_address) as addr_cnt
    from records
    group by ip_address
    having count(ip_address) > (select count(ip_address) from records) * .01
)
select ip_address, addr_cnt from topPercent
union all
select 'All others', count(distinct ip_address) - (select count(*) from topPercent)

如果分析函数可用,则第三个选项可能最快:

select case when pct > 0.01 then ip_address else 'All others' end, sum(addr_cnt)
from (
    select ip_address, addr_cnt, addr_cnt * 1.0e / sum(addr_cnt) over () as pct
    from (
        select ip_address, count(ip_address) as addr_cnt
        from records
        group by ip_address
    ) T1
) T2
group by case when pct > 0.01 then ip_address else 'All others' end