Question

我在postgres数据库中有下表，共有3列：

ReaderId: String
TagId: String
Timestamp: Timestamp

ReaderId  TagId  Timestamp
A         T1      20190101-00:00:00  *  ~      
A         T1      20190101-00:00:00     ~
A         T1      20190101-00:00:01    
A         T1      20190101-00:00:02   
B         T1      20190101-00:00:03  *
B         T1      20190101-00:00:03 
B         T1      20190101-00:00:04   
A         T1      20190101-00:00:05  * 
A         T1      20190101-00:00:06 
A         T1      20190101-00:00:07   
C         T1      20190101-00:00:08  *
C         T1      20190101-00:00:09   
B         T2      20190101-00:00:01  *
B         T2      20190101-00:00:04 
B         T2      20190101-00:00:05   
C         T2      20190101-00:00:06  *
C         T2      20190101-00:00:07   
B         T2      20190101-00:00:07  *   ~
B         T2      20190101-00:00:07      ~
B         T2      20190101-00:00:08

我想要一个查询/函数，当提供TagId时，每次在最后一次读取该标签的其他读取器上读取该标签时，都会返回第一行（如果从未读取过该标签，则返回第一行）之前阅读）。符合条件的行将通过上面的*突出显示。如果有多个相同且“等于第一”的行，则仅应返回其中一个（如上面带有〜的行）。

此功能必须高效，因为预计数据量很容易增长到数百万/低数十亿的行。我可以创建任何需要的索引。

我的SQL生锈了，一开始从来都不是很好，所以这里的任何帮助都值得赞赏！

Answer 1

只需使用lag()：

select t.*
from (select t.*,
             lag(ReaderId) over (partition by TagId order by Timestamp) as prev_ReaderId
      from t
     ) t
where prev_ReaderId is null or prev_ReaderId <> ReaderId;

在Postgres中，您可以将where子句缩短为：

where prev_ReaderId is distinct from ReaderId

Answer 2

使用窗口功能lag()：

select 
    reader_id, tag_id, timestamp
from (
    select
        reader_id, tag_id, timestamp,
        lag(reader_id) over (partition by tag_id order by timestamp)
    from my_table
    ) s
where lag is distinct from reader_id
order by tag_id, timestamp

窗口函数很昂贵，但是替代解决方案（如果存在）宁可便宜。 (tag_id, timestamp)上的索引将支持查询。

db<>fiddle.上的在线演示

另请参阅有关窗口功能in the documentation.

Answer 3

像已经建议的其他方式一样使用lag()。但是您指定了：

在提供TagId
时

因此您可以简化。也快一点：

SELECT reader_id, tag_id, ts
FROM  (
   SELECT *, lag(reader_id) OVER (ORDER BY ts) IS DISTINCT FROM reader_id AS pick
   FROM   tbl
   WHERE  tag_id = 'T1'  --  your tag_id here
   ) sub
WHERE  pick;

db <>提琴here

也适用于列NULL中的reader_id值。

您可以将其包装在SQL函数或准备好的语句中，并且仅传递您的tag_id。

每次列更改时选择第一行

3 个答案: