我有一个名为fruits
的下表:
id fruit_bought quantity date
1 | orange | 100 | 2018-01-10
2 | apple | 50 | 2018-02-05
3 | orange | 75 | 2018-03-07
4 | orange | 200 | 2018-03-15
5 | apple | 10 | 2018-03-17
6 | orange | 20 | 2018-03-20
我想返回在前10天中任何时间购买了橙子的行,其中有fruit_bought
的{{1}} ,从日期 2018- 03-20 (与orange
6行)。
例如:
id
开始,此日期(第2018-03-20
行)购买了橙子id 6
)上id 4
行)上 最后,我要创建的查询将返回id 3
为3、4和6(而不是1)的行。
到目前为止,我的查询如下:
id
这将返回每行,其中SELECT *, LAG(date, 1) OVER (PARTITION BY fruit_bought) FROM fruits
WHERE fruit_bought = 'orange';
是fruit_bought
, 并添加一个额外的orange
列。
答案 0 :(得分:1)
此答案基于Gordon Linoff's idea, 但有一些调整:
FILTER is not implemented for pure window functions类似于Postgresql 11中的lead()或lag()(尚未)。因此,将WHERE fruit_bought='orange'
用作整个内部SELECT的条件。
要确保选择最后一个日期的行,请使用LEAD(date, 1, '-infinity')
。这使得next_date
的默认值等于-infinity
时间戳。因此,date >= next_date - interval '10 day'
在最后一个日期将为TRUE。
让我们在10天之内互相呼叫一个集群。要仅选择最后一个集群中的行,
计算一个累积和,该累积和计算cond
为FALSE的次数(因为FALSE值将各个簇分开):
SUM(CASE WHEN cond IS TRUE THEN 0 ELSE 1 END) OVER (ORDER BY date DESC) AS cluster_num
,仅选择cluster_num等于0的行。由于我们ORDER BY date DESC
,所以第0个群集是最后一个群集。
SELECT *
FROM (
SELECT *, SUM(CASE WHEN cond IS TRUE THEN 0 ELSE 1 END) OVER (ORDER BY date DESC) AS cluster_num
FROM (
SELECT *, date >= next_date - interval '10 day' AS cond
FROM (
SELECT id, fruit_bought, date,
LEAD(date, 1, '-infinity')
OVER (PARTITION BY fruit_bought ORDER BY date) AS next_date
FROM fruits
WHERE fruit_bought='orange'
-- restrict date here to specify an "initial date"
AND date <= '2018-04-01'
) t1
) t2
) t3
WHERE cond AND cluster_num = 0
ORDER BY date ASC
收益
| id | fruit_bought | date | next_date | cond | cluster_num |
|----+--------------+------------+------------+------+-------------|
| 3 | orange | 2018-03-07 | 2018-03-15 | t | 0 |
| 4 | orange | 2018-03-15 | 2018-03-20 | t | 0 |
| 6 | orange | 2018-03-20 | -infinity | t | 0 |
设置:
CREATE TABLE fruits (
fruitid INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
id INT,
fruit_bought TEXT,
quantity INT,
date DATE);
INSERT INTO fruits (id, fruit_bought, quantity, date)
VALUES (1,'orange',100,'2018-01-10')
, (2,'apple',50,'2018-02-05')
, (3,'orange',75,'2018-03-07')
, (4,'orange',200,'2018-03-15')
, (5,'apple',10,'2018-03-17')
, (6,'orange',20,'2018-03-20')
, (7,'orange',20,'2018-01-09');
答案 1 :(得分:0)
一种方法是lag()
和filter。 。 。但是这样使用:
select f.*
from (select f.*,
lag(date) filter (where fruit_bought = 'orange') over (order by date) as prev_orange_date
from fruits f
) f
where prev_orange_date >= date - interval '10 day';
但是,exists
也浮现在脑海:
select f.*
from fruits f
where exists (select 1
from fruits f2
where f2.fruit_bought = 'orange' and
f2.date >= f.date - interval '10 day' and
f2.date < f.date
);
这两个查询都假设日期是唯一的,如您的示例所示。如果您有关系,那么每个人都可以工作。但是,您必须指定购买橘子后的处理日期。
答案 2 :(得分:0)
您可以尝试以下吗?
a<-data.frame(measuretime=c("2010-10-20 11:00:00", "2010-12-15 13:18:00", "2011-02-14 09:00:00",
"2011-03-08 11:52:00", "2012-08-23 22:59:00"), value=c(1.5, 6.3, 0.1, 9.9, 7))
b<-data.frame(measuretime=c("2010-12-15 13:18:00", "2011-02-14 10:30:00",
"2011-03-08 11:52:00", "2011-04-18 12:23:00"), value=c(22, 71, 12, 69))
谢谢