我有一个看起来像这样的表:
CREATE TABLE foobar (
id SERIAL PRIMARY KEY,
data_entry_date DATE NOT NULL,
user_id INTEGER NOT NULL,
wine_glasses_drunk INTEGER NOT NULL,
whisky_shots_drunk INTEGER NOT NULL,
beer_bottle_drunk INTEGER NOT NULL
);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-01', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-02', 1, 4,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-03', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-04', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-05', 1, 2,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-07', 1, 1,2,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-08', 1, 4,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-11', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-12', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-13', 1, 2,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-14', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-15', 1, 9,3,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-16', 1, 0,4,2);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-17', 1, 0,5,3);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-18', 1, 2,2,5);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-20', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-23', 1, 1,3,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-24', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-01', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-02', 1, 2,3,4);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-05', 1, 1,2,2);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-09', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-10', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-11', 1, 3,6,3);
我想写一个查询,告诉我在一段时间内TOTAL wine_glasses_drunk,TOTAL whisky_shots_drunk和TOTAL beer_bottles_drunk与上一期间的TOTALs之间的区别。
听起来可能比现在复杂得多。如果我们使用的周期*为周 == 7天,则查询应返回本周所消耗总数的差异,与消耗的总数相比较上周。
表格中的日期并不连续 - 即有一些缺少日期,因此查询需要在确定期间计算的日期时找到最相关的日期。
This is what I have so far:
-- using hard coded dates
SELECT (SUM(f1.wine_glasses_drunk) - SUM(f2.wine_glasses_drunk)) as wine_diff,
(SUM(f1.whisky_shots_drunk) - SUM(f2.whisky_shots_drunk)) as whisky_diff,
(SUM(f1.beer_bottle_drunk) - SUM(f2.beer_bottle_drunk)) as beer_diff
FROM foobar f1 INNER JOIN foobar f2 ON f2.user_id=f1.user_id
WHERE f1.user_id=1
AND f1.data_entry_date BETWEEN '2011-01-08' AND '2011-01-15'
AND f2.data_entry_date BETWEEN '2011-01-01' AND '2011-01-08'
AND f1.data_entry_date - f2.data_entry_date between 6 and 9;
上面的SQL显然是一个hack(特别是f1.data_entry_date - f2.data_entry_date between 6 and 9
标准)。我在excel中检查了结果,上面查询的结果是(毫不含糊)错误。
如何编写此查询 - 如何修改它以便它可以处理数据库中的非连续日期?
我正在使用postgreSQl,但如果可能的话,更愿意使用数据库不可知(即ANSI)SQL。
答案 0 :(得分:2)
我不完全确定你所描述的描述是否正确,但我会使用两种不同的功能来获得你想要的结果。
首先,看一下date_trunc函数。这可以获得一周中第一天的日期,您可以对其进行分组以获得一周的总和。如果一周的第一天不是您想要的,您可以使用日期算法对其进行排序。我想这一周的第一天是星期一。
其次,您可以使用滞后窗函数来查找上一行的总和。请注意,如果您缺少一周,此函数将查看上一行而不是前一周。我已经检查了查询,以确保数据库正在查看正确的行。
select
user_id,
week_start_date,
this_week_wine_glasses_drunk -
case when is_consecutive_weeks = 'TRUE'
then last_week_wine_glasses_drunk else 0 end as wine_glasses_drunk,
this_week_whisky_shots_drunk -
case when is_consecutive_weeks = 'TRUE'
then last_week_whisky_shots_drunk else 0 end as whisky_shots_drunk,
this_week_beer_bottle_drunk -
case when is_consecutive_weeks = 'TRUE'
then last_week_beer_bottle_drunk else 0 end as beer_bottle_drunk
from (
select
user_id,
week_start_date,
this_week_wine_glasses_drunk,
this_week_whisky_shots_drunk,
this_week_beer_bottle_drunk,
case when (lag(week_start_date)
over (partition by user_id order by week_start_date) + interval '7' day)
= week_start_date then 'TRUE' end as is_consecutive_weeks,
lag(this_week_wine_glasses_drunk)
over (partition by user_id order by week_start_date) as last_week_wine_glasses_drunk,
lag(this_week_whisky_shots_drunk)
over (partition by user_id order by week_start_date) as last_week_whisky_shots_drunk,
lag(this_week_beer_bottle_drunk)
over (partition by user_id order by week_start_date) as last_week_beer_bottle_drunk
from (
select
user_id,
date_trunc('week', data_entry_date) as week_start_date,
sum(wine_glasses_drunk) as this_week_wine_glasses_drunk,
sum(whisky_shots_drunk) as this_week_whisky_shots_drunk,
sum(beer_bottle_drunk) as this_week_beer_bottle_drunk
from foobar
group by user_id,
date_trunc('week', data_entry_date)
) a
) b
A SQL fiddle is available供您查看。
顺便说一下,我来自Oracle背景,并使用PostgreSQL文档和SQL Fiddle破解了它。希望这就是你所需要的。
答案 1 :(得分:1)
略有不同的方法(我会让你填写日期参数。):
Declare @StartDate1, @EndDate1, @StartDate2, @EndDate2 AS Date
Set @StartDate1='6/1/2012'
Set @EndDate1='6/15/2012'
Set @StartDate2='6/16/2012'
Set @EndDate2='6/30/2012'
SELECT SUM(U.WineP1)-SUM(U.WineP2) AS WineDiff, SUM(U.WhiskeyP1)-SUM(U.WhiskeyP2) AS WhiskeyDiff, SUM(U.BeerP1)-SUM(U.BeerP2) AS BeerDiff
FROM
(
SELECT SUM(wine_glasses_drunk) AS WineP1, SUM(whisky_shots_drunk) AS WhiskeyP1, SUM(beer_bottle_drunk) AS BeerP1, 0 AS WineP2, 0 AS WhiskeyP2, 0 AS BeerP2
FROM foobar
WHERE data_entry_date BETWEEN @StartDate1 AND @EndDate1
UNION ALL
SELECT 0 AS WineP1, 0 AS WhiskeyP1, 0 AS BeerP1, SUM(wine_glasses_drunk) AS WineP2, SUM(whisky_shots_drunk) AS WhiskeyP2, SUM(beer_bottle_drunk) AS BeerP2
FROM foobar
WHERE data_entry_date BETWEEN @StartDate2 AND @EndDate2
) AS U
答案 2 :(得分:0)
作为开发这些查询的一般规则,请在peices中构建它,然后将它们组合起来。首先找到一个好的结构,然后分别构建你需要的所有peices,这样你就可以理解每个peice的工作原理。
在这里,我认为您需要使用更多子查询来找到明确的方法。我想你可以尝试这些方法:
计算所需的日期范围,并将它们保存为变量。 (您可能希望在日期中添加天数以查找下一个句点,而不是上面给出的代码。)
Declare @SQL1, @SQL2, @SQL3 as Date
Set @SQL1=(SQL1)
...
接下来,以使用日期作为参数的方式查找每周总计。
Select
sum(wine_glasses_drunk) as wine_totals,
sum(whiskey_shots_drunk) as whiskey_totals,
sum(beer_bottle_drunk) as beer_totals,
case
when data_entry_date between @SQL1 and @SQL2 then 1
when data_entry_date between @SQL2 and @SQL3 then 2
end as period_number
from foobar
然后,围绕此构建您需要的摘要查询,因为数据的格式使其变得简单,并且您不需要多次使用相同值的这么多总和。
答案 3 :(得分:0)
我打算将此作为我的另一个答案的编辑添加,但它实际上是一种不同的方式,所以应该是一个单独的答案。
我认为我更喜欢我给出的其他答案,但即使数据存在差距,这个答案也应该有效。
要设置查询的参数,请更改with子句的period_start_date
部分中period_days
和query_params
的值。
with query_params as (
select
date '2011-01-01' as period_start_date,
7 as period_days
),
summary_data as (
select
user_id,
(data_entry_date - period_start_date)/period_days as period_number,
sum(wine_glasses_drunk) as wine_glasses_drunk,
sum(whisky_shots_drunk) as whisky_shots_drunk,
sum(beer_bottle_drunk) as beer_bottle_drunk
from foobar
cross join query_params
group by user_id,
(data_entry_date - period_start_date)/period_days
)
select
user_id,
period_number,
period_start_date + period_number * period_days as period_start_date,
sum(wine_glasses_drunk) as wine_glasses_drunk,
sum(whisky_shots_drunk) as whisky_shots_drunk,
sum(beer_bottle_drunk) as beer_bottle_drunk
from (
-- this weeks data
select
user_id,
period_number,
wine_glasses_drunk,
whisky_shots_drunk,
beer_bottle_drunk
from summary_data
union all
-- last weeks data
select
user_id,
period_number + 1 as period_number,
-wine_glasses_drunk as wine_glasses_drunk,
-whisky_shots_drunk as whisky_shots_drunk,
-beer_bottle_drunk as beer_bottle_drunk
from summary_data
) a
cross join query_params
where period_number <= (select max(period_number) from summary_data)
group by
user_id,
period_number,
period_start_date + period_number * period_days
order by 1, 2
同样,SQL Fiddle可用。