SQL Query忽略来自同一个表的类似行

时间:2017-10-04 10:36:50

标签: sql sql-server

我有一个由日志和时间戳组成的表,例如:

timestmp    log_error
1507031197631   Er7
1507031197621   Er8
1507031197409   Er9
1506888444602   Er10
1506880074401   Er10
1506880047684   Er10
1506880030996   Er10
1506879980929   Er10
1506879977580   Er10
1506879974250   Er10
1506879970901   Er10
1506879964241   Er10
1506879954212   Er10
1506879900817   Er10

我想在一段时间戳(5分钟)内编写一个忽略相同连续错误(在本例中为Er10)的SQL查询。我怎么能做到这一点?使用自我内联?我想要的结果是这样的:

timestmp    log_error
1507031197631   Er7
1507031197621   Er8
1507031197409   Er9
1506888444602   Er10 /* The last one from this example, based on the difference in timestmp */
1506879900817   Er10 /* The first Er10 registry */

2 个答案:

答案 0 :(得分:1)

您可以使用row_number创建连续的log_error值组。这种方法被称为“tabibitosan方法”

select log_error, min(timestmp), max(timestmp)
from (
    select t.*,
        row_number() over (order by timestmp)
        - row_number() over (partition by log_error order by timestmp) as grp
    from your_table t
    ) t
group by log_error, grp;

我承认结果格式并不完全符合您的要求,但它拥有您需要的信息。

答案 1 :(得分:0)

您可以使用lag(),累计金额和group by

执行此操作
select log_error, min(timestamp), max(timestamp)
from (select l.*,
             sum(case when prev_le = log_error and
                           prev_timestamp > timestamp - "5 minutes"
                      then 0 else 1
                 end) over (order by timestamp) as grp
      from (select l.*,
                   lag(log_error) over (order by timestmp) as prev_le,
                   lag(timestmp) over (order by timestmp) as prev_timestmp
            from logs l
           ) l
     ) l
group by grp, log_error;

注意:- "5 minutes"旨在成为其逻辑。据推测,这可能是5 * 605 * 60 * 1000