Question

我有一个名为fruits的下表：

id  fruit_bought   quantity   date
1 | orange       | 100      | 2018-01-10
2 | apple        | 50       | 2018-02-05
3 | orange       | 75       | 2018-03-07
4 | orange       | 200      | 2018-03-15
5 | apple        | 10       | 2018-03-17
6 | orange       | 20       | 2018-03-20

我想返回在前10天中任何时间购买了橙子的行，其中有fruit_bought的{{1}} ，从日期 2018- 03-20 （与orange 6行）。

例如：

从id开始，此日期（第2018-03-20行）购买了橙子
在此之前的10天有没有购买橙子？是：在“ 2018-03-15”（行id 6）上
从此日期起10天之前有没有购买橙子？是：在“ 2018-03-07”（第id 4行）上
从此日期起10天之前有没有购买橙子？不。

最后，我要创建的查询将返回id 3为3、4和6（而不是1）的行。

到目前为止，我的查询如下：

id

这将返回每行，其中SELECT *, LAG(date, 1) OVER (PARTITION BY fruit_bought) FROM fruits WHERE fruit_bought = 'orange';是fruit_bought，并添加一个额外的orange列。

Answer 1

此答案基于Gordon Linoff's idea，但有一些调整：

FILTER is not implemented for pure window functions类似于Postgresql 11中的lead（）或lag（）（尚未）。因此，将WHERE fruit_bought='orange'用作整个内部SELECT的条件。
要确保选择最后一个日期的行，请使用LEAD(date, 1, '-infinity')。这使得next_date的默认值等于-infinity时间戳。因此，date >= next_date - interval '10 day'在最后一个日期将为TRUE。
让我们在10天之内互相呼叫一个集群。要仅选择最后一个集群中的行，计算一个累积和，该累积和计算cond为FALSE的次数（因为FALSE值将各个簇分开）：
```
SUM(CASE WHEN cond IS TRUE THEN 0 ELSE 1 END) OVER (ORDER BY date DESC) AS cluster_num
```
，仅选择cluster_num等于0的行。由于我们ORDER BY date DESC，所以第0个群集是最后一个群集。

SELECT *
FROM (
    SELECT *, SUM(CASE WHEN cond IS TRUE THEN 0 ELSE 1 END) OVER (ORDER BY date DESC) AS cluster_num
    FROM (
        SELECT *, date >= next_date - interval '10 day' AS cond
        FROM (
            SELECT id, fruit_bought, date, 
                LEAD(date, 1, '-infinity') 
                OVER (PARTITION BY fruit_bought ORDER BY date) AS next_date 
            FROM fruits 
            WHERE fruit_bought='orange'
            -- restrict date here to specify an "initial date"
            AND date <= '2018-04-01'  
        ) t1
    ) t2
) t3
WHERE cond AND cluster_num = 0
ORDER BY date ASC

收益

| id | fruit_bought |       date |  next_date | cond | cluster_num |
|----+--------------+------------+------------+------+-------------|
|  3 | orange       | 2018-03-07 | 2018-03-15 | t    |           0 |
|  4 | orange       | 2018-03-15 | 2018-03-20 | t    |           0 |
|  6 | orange       | 2018-03-20 |  -infinity | t    |           0 |

设置：

CREATE TABLE fruits (
    fruitid INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    id INT,
    fruit_bought TEXT,
    quantity INT,
    date DATE);

INSERT INTO fruits (id, fruit_bought, quantity, date)
VALUES (1,'orange',100,'2018-01-10')
, (2,'apple',50,'2018-02-05')
, (3,'orange',75,'2018-03-07')
, (4,'orange',200,'2018-03-15')
, (5,'apple',10,'2018-03-17')
, (6,'orange',20,'2018-03-20')
, (7,'orange',20,'2018-01-09');

Answer 2

一种方法是lag()和filter。。。但是这样使用：

select f.*
from (select f.*,
             lag(date) filter (where fruit_bought = 'orange') over (order by date) as prev_orange_date
      from fruits f
     ) f
where prev_orange_date >= date - interval '10 day';

但是，exists也浮现在脑海：

select f.*
from fruits f
where exists (select 1
              from fruits f2
              where f2.fruit_bought = 'orange' and
                    f2.date >= f.date - interval '10 day' and
                    f2.date < f.date
             );

这两个查询都假设日期是唯一的，如您的示例所示。如果您有关系，那么每个人都可以工作。但是，您必须指定购买橘子后的处理日期。

Answer 3

您可以尝试以下吗？

a<-data.frame(measuretime=c("2010-10-20 11:00:00", "2010-12-15 13:18:00", "2011-02-14 09:00:00", 
                            "2011-03-08 11:52:00", "2012-08-23 22:59:00"), value=c(1.5, 6.3, 0.1, 9.9, 7))
b<-data.frame(measuretime=c("2010-12-15 13:18:00", "2011-02-14 10:30:00", 
                            "2011-03-08 11:52:00", "2011-04-18 12:23:00"), value=c(22, 71, 12, 69))

谢谢

使用LAG（）和PARTITION BY在日期的10天内返回IF行

3 个答案: