Question

我有一个像这样定义的表：

Col1(timestamp)      Col2     Col3    Col4    Col5    Col6  

12/5/2016 4:00:59pm  yes      test    test    test    test  
12/5/2016 4:00:59pm  yes      test1   test1   test1   test1 
12/5/2016 4:00:29pm  no       test    test    test    test  
12/5/2016 4:00:29pm  no       test1   test1   test1   test1 
12/5/2016 3:59:59pm  yes      test    test    test    test  
12/5/2016 3:59:59pm  yes      test1   test1   test1   test1  
12/5/2016 3:59:29pm  yes      test    test    test    test  
12/5/2016 3:59:29pm  yes      test1   test1   test1   test1  
12/5/2016 3:58:59pm  yes      test    test    test    test  
12/5/2016 3:58:59pm  yes      test1   test1   test1   test1  
12/5/2016 3:58:29pm  yes      test    test    test    test  
12/5/2016 3:58:29pm  yes      test1   test1   test1   test1  
12/5/2016 3:57:59pm  yes      test    test    test    test
12/5/2016 3:57:59pm  yes      test1   test1   test1   test1

正如您所看到的，每隔30秒，一组新记录将添加到表中，其中包含查询运行时的时间戳和变量Col2。在这种情况下，为简单起见，它只有两组（test，test1），但它可能更多。 Col2实际上可能不仅仅是是/否，但为了简单起见，我们可以说它可以是是或否。

我的问题是如何编写一个只返回具有最新时间戳的记录集的查询？所有这一切的一个巨大限制（这导致我进入stackoverflow）是，当我写这个查询时，我有来拉取最近90秒窗口内的所有记录，原因我赢了进入。

目前我现在所拥有的是：

SELECT Col2, Col3, Col4, Col5, Col6, MAX(Col1)  
FROM table  
WHERE Col1 > (CURRENT TIMESTAMP - CURRENT TIMEZONE - 90 SECONDS)  
GROUP BY Col2, Col3, Col4, Col5, Col6

如果此示例表中的所有记录对于Col2都为“是”，那么我的上述查询将给出我想要的内容，我会完成。但是，值可以是是或否。

这导致当前问题我被困在哪里，假设时间是查询运行时的下午4:01:00，这将在下午4:00:59返回2'是'记录并且2'否'记录于下午4:00:29。我希望它只返回2'是'记录，即具有最新时间戳的记录。

因为我从Java应用程序调用此查询，所以我当前有一个函数，它接受上述查询中的结果集作为参数，然后遍历查询返回的每个记录并删除所有重复项（即在这种情况下2'没有'记录）。但是，我希望不是添加这个逻辑，如果有一种方法我可以编写查询，以便它永远不会直接返回重复的记录。

更新：因此，在尝试实施Matt的原始解决方案后，我遇到了另一个问题。如果每个集合（test，test1等）的时间戳相同，Matt的解决方案就可以工作。不幸的是，它们在我们的表中并不相同，这意味着我上面的表定义不正确。请参阅下面的更新表定义：

Col1(timestamp)      Col2     Col3    Col4  

12/5/2016 4:00:59pm  yes      test    test
12/5/2016 4:00:58pm  yes      test1   test1
12/5/2016 4:00:29pm  no       test    test  
12/5/2016 4:00:28pm  no       test1   test1
12/5/2016 3:59:59pm  yes      test    test 
12/5/2016 3:59:58pm  yes      test1   test1  
12/5/2016 3:59:29pm  yes      test    test
12/5/2016 3:59:28pm  yes      test1   test1 
12/5/2016 3:58:59pm  yes      test    test 
12/5/2016 3:58:58pm  yes      test1   test1 
12/5/2016 3:58:29pm  yes      test    test  
12/5/2016 3:58:28pm  yes      test1   test1  
12/5/2016 3:57:59pm  yes      test    test
12/5/2016 3:57:58pm  yes      test1   test1

Col5 / 6并不重要，所以为了简单起见删除了它们。所以在这种情况下，t=1返回4:00:59 pm记录，t=2返回4:00:58 pm记录等。基本上，每组（test，test1，test2，testn ..）有自己独特的时间戳。这意味着要使此查询起作用，我需要t小于或等于有多少个唯一集的计数。请参阅下面的伪查询：

SELECT *
FROM
    (
       SELECT
          Col2, Col3, Col4,
          ,DENSE_RANK() OVER (ORDER BY Col1 DESC) as Ranking
       FROM
          Table
       WHERE
          Col1 > (CURRENT TIMESTAMP - CURRENT TIMEZONE - 90 SECONDS)
    ) t
WHERE
    t.Ranking <= [count of how many unique sets there are, in this case it would return 2(test,test1)]

弄清楚如何获得这种独特的计数是我坚持的地方。

UPDATE2：再次更新表定义以显示简单所需的最小列：

Col1(timestamp)      Col2    Col3

12/5/2016 4:00:59pm  test1   test1
12/5/2016 4:00:58pm  test2   test2
12/5/2016 4:00:57pm  test3   test3
12/5/2016 4:00:56pm  test4   test4
12/5/2016 4:00:29pm  test1   test1
12/5/2016 4:00:28pm  test2   test2
12/5/2016 4:00:27pm  test3   test3
12/5/2016 4:00:26pm  test4   test4
12/5/2016 3:59:59pm  test1   test1
12/5/2016 3:59:58pm  test2   test2
12/5/2016 3:59:57pm  test3   test3
12/5/2016 3:59:56pm  test4   test4
12/5/2016 3:59:29pm  test1   test1
12/5/2016 3:59:28pm  test2   test2
12/5/2016 3:59:27pm  test3   test3
12/5/2016 3:59:26pm  test4   test4
12/5/2016 3:58:59pm  test1   test1
12/5/2016 3:58:58pm  test2   test2
12/5/2016 3:58:57pm  test3   test3
12/5/2016 3:58:56pm  test4   test4
12/5/2016 3:58:29pm  test1   test1
12/5/2016 3:58:28pm  test2   test2
12/5/2016 3:58:27pm  test3   test3
12/5/2016 3:58:26pm  test4   test4
12/5/2016 3:57:59pm  test1   test1
12/5/2016 3:57:58pm  test2   test2
12/5/2016 3:57:57pm  test3   test3
12/5/2016 3:57:56pm  test4   test4

如果上面是表格，我需要查询返回以下内容：

Col1(timestamp)      Col2    Col3  

12/5/2016 4:00:59pm  test1   test1
12/5/2016 4:00:58pm  test2   test2
12/5/2016 4:00:57pm  test3   test3
12/5/2016 4:00:56pm  test4   test4

AKA，每个独特出现的Col2 / Col3的最新时间戳。使用Matt的查询作为基础，如果我设置t.Ranking = 1，它将只返回以下内容：

Col1(timestamp)      Col2     Col3

12/5/2016 4:00:59pm  test1   test1

t.Ranking = 2将返回：

Col1(timestamp)      Col2     Col3

12/5/2016 4:00:58pm  test2   test2

等等。为了使此查询对我有效，t.Ranking必须是一个动态值，始终小于或等于 Col2和Col3的唯一出现次数。所以在我的情况下，我需要t.Ranking <=4因为有4个独特的Col2和Col3（test1，test2，test3，test4）。如果有5个唯一出现的Col2 / Col3，则t.Ranking将小于或等于5，依此类推。

SELECT *
FROM
    (
       SELECT
          Col1, Col2, Col3,
          ,DENSE_RANK() OVER (ORDER BY Col1 DESC) as Ranking
       FROM
          Table
       WHERE
          Col1 > (CURRENT TIMESTAMP - CURRENT TIMEZONE - 90 SECONDS)
    ) t
WHERE
    t.Ranking <= [count of how many unique occurrences of 
    Col2/Col3 there are, in this case it would return 
    4(test1,test2,test3,test4)]

弄清楚如何计算这个数字是我坚持的地方。

Answer 1

SELECT *
FROM
    (
       SELECT
          COL1, Col2, Col3, Col4, Col5, Col6
          ,DENSE_RANK() OVER (ORDER BY Col1 DESC) as Ranking
       FROM
          Table
       WHERE
          Col1 > (CURRENT TIMESTAMP - CURRENT TIMEZONE - 90 SECONDS)
    ) t
WHERE
    t.Ranking = 1

从事情的声音我很确定你只需要用DENSE_RANK（）或RANK（）创建一个排名，这样你就可以拿起领带然后选择当= 1来获得最新的时间戳，无论如何许多记录有1,2,100等。

Window函数绝对是这类操作的朋友，DB2支持它们。 http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/apsg/src/tpc/db2z_rankrows.html

编辑有点不清楚你想要什么。但也许你想要每个Col1的前1个记录，Col2与领带的独特组合？如果是这样，请使用相同的查询，只添加分区依据：

SELECT *
FROM
    (
       SELECT
          COL1, Col2, Col3, Col4, Col5, Col6
          ,DENSE_RANK() OVER (PARTITION BY
         CASE WHEN col1 < col2 THEN col1 ELSE col2 END
         ,CASE WHEN col1 < col2 THEN col2 ELSE col1 END
      ORDER BY Col1 DESC) as Ranking
       FROM
          Table
       WHERE
          Col1 > (CURRENT TIMESTAMP - CURRENT TIMEZONE - 90 SECONDS)
    ) t
WHERE
    t.Ranking = 1

注意分区中的case语句会这样：

Col1      Col2
type     type1
type1    type

将被视为同一案件。

如果您不想要关系，只需将DENSE_RANK()更改为ROW_NUMBER()

SQL - 从最大（列）子集

1 个答案: