Question

假设我有一张名为＆＃34; Diary＆＃34;像这样：

| id | user_id |        recorded_at       | record |
|----|---------|--------------------------|--------|
| 20 |  50245  |2017-10-01 23:00:14.765366|   89   |
| 21 |  50245  |2017-12-05 10:00:33.135331|   97   |
| 22 |  50245  |2017-12-31 11:50:23.965134|   80   |
| 23 |  76766  |2015-10-06 11:00:14.902452|   70   |
| 24 |  76766  |2015-10-07 22:40:59.124553|   81   |

对于每个用户，我想在此之前的一个月内检索最新的行和所有行。

换句话说，对于user_id 50245，我想要他/她的数据来自＆＃34; 2017-12-01 11：50：23.965134＆＃34;至＆＃34; 2017-12-31 11：50：23.965134＆＃34 ;;对于user_id 76766，我想要他/她的数据来自＆＃34; 2015-09-07 22：40：59.124553＆＃34; to＆＃34; 2015-10-07 22：40：59.124553＆＃34;。

因此，期望的结果如下所示：

| id | user_id |        recorded_at       | record |
|----|---------|--------------------------|--------|
| 21 |  50245  |2017-12-05 10:00:33.135331|   97   |
| 22 |  50245  |2017-12-31 11:50:23.965134|   80   |
| 23 |  76766  |2015-10-06 11:00:14.902452|   70   |
| 24 |  76766  |2015-10-07 22:40:59.124553|   81   |

请注意，id 20的记录不，因为它是在user_id 50245的最后一次记录之前超过一个月。

有什么方法可以编写SQL查询来实现这个目标吗？

Answer 1

我倾向于使用窗口函数：

Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.

Error: Continuous value supplied to discrete scale

Answer 2

直截了当的方法是使用子查询获取每个recorded_at的最大user_id，然后加入：

select d.*
  from diary d
       join ( select user_id, max(recorderd_at) mra
                from diary
               group by user_id ) m on d.user_id = m.user_id
 where m.mra <= d.recorded_at + interval '1 month'

这有两次访问表的缺点（在不同的RDBMS中可能有所不同 - 使用explain来查看执行计划）。

更好的选择是使用窗口函数一次完成所有操作：

select id, user_id, recorderd_at
  from ( select *, max(recorderd_at) over (partition by user_id) as mra
           from diary ) x
 where mra <= recorderd_at + interval '1 months'

免责声明我没有对上面的查询进行测试，但无论如何你应该得到这个想法 - 请参阅http://sqlfiddle.com/#!17/e90000/9了解类似架构的工作示例

Answer 3

未经过测试但是这样的事情应该有用。

我会使用子查询来获取last_record，然后过滤出日期和上个月的那些，例如：

select d.* from diary d,
(select max(recorded_at) l from diary group by user_id) as last_record 
where  d.recorded_at = last_record.l
or
  ( 
   d.recorded_at  >= date_trunc('month', last_record.l - interval '1' month)
   and d.recorded_at  < last_record.l
  )

Answer 4

对于小型表，任何（有效的）查询技术都是好的。

对于大表，详细信息很重要。假设：

还有users表user_id作为PK，包含所有相关用户（或可能还有一些）。这是典型的设置。
您已经（或可以创建）diary (user_id, recorded_at DESC NULLS LAST)上的索引。如果定义NULLS LAST recorded_at，则NOT NULL是可选的。但请确保查询与索引匹配。
每个用户超过几行 - 典型用例。

这应该是最快的选择之一：

SELECT d.*
FROM   users u
CROSS  JOIN LATERAL (
   SELECT recorded_at
   FROM   diary
   WHERE  user_id = u.user_id
   ORDER  BY recorded_at DESC NULLS LAST
   LIMIT 1
   ) d1
JOIN   diary d ON d.user_id = u.user_id
              AND d.recorded_at >= d1.recorded_at - interval '1 month'
ORDER  BY d.user_id, d.recorded_at;

准确地生成您想要的结果。

对于每个用户只有几个行，子查询中的max()或DISTINCT ON ()通常会更快。

在每个用户的最后一条记录

4 个答案: