使用SQLite通过分组计算模式

时间:2014-03-11 19:27:50

标签: sqlite

我有一个带有ID(IP地址)和因子变量(Web浏览器)的表,我需要创建另一个表,每个ID都有一个记录,以及因子变量的模式。我在考虑像SELECT ip, MODE (browser) FROM log GROUP BY ip这样的东西。

不幸的是,SQLite没有实现MODE函数,所以这不起作用。我想到构建一个包含每个浏览器计数的临时表,然后使用SELECT DISTINCT ONRANK ()语句,但SQLite也不支持这些。

此外,在单个语句中这将是很好的,因为还有其他几个因素我的模式我也需要(并且也用相同的ID分组)。

2 个答案:

答案 0 :(得分:1)

要计算模式,请按browser列进行分组,获取每个组的COUNT(*),按该值排序,然后获取具有最大值的记录。

如果您已有另一个GROUP BY,请使用相关子查询:

SELECT ip,
       (SELECT browser
        FROM log AS log2
        WHERE ip = ips.ip
        GROUP BY browser
        ORDER BY COUNT(*) DESC
        LIMIT 1)
FROM (SELECT DISTINCT ip
      FROM log) AS ips

答案 1 :(得分:0)

有一个带有时间戳,标签和延迟的日志表。我们想查看每个标签的延迟(发送时间:ST)MODE(módusz)值,按时间戳分组。一组数据值的MODE是最常出现的值。

select L, T, avg( ST ) as MODEST, C
from (
    select L, T, ST, count( ST ) as C
    from (
            select label as L, 
                         substr( substr( timeStamp, 0, 8) || '00000000', 0, 14 ) as T, 
                         latency as ST 
            from LOG 
            order by L, T, ST 
            ) as XX 
    group by L, T, ST 
) as YY
where L || '#' || T || '#' || C in (  select L || '#' || T || '#' || max(C)
                                      from(
                                        select L, T, count( ST ) as C
                                        from (
                                          select label as L, 
                                          substr( substr( timeStamp, 0, 8) || '00000000', 0, 14 ) as T, 
                                          latency as ST
                                          from LOG 
                                        ) as XX 
                                        group by L, T, ST ) as YY
                                      group by L, T )
group by L, T, C