SQL:在一段时间内计算的SUM之间的差异

时间:2012-06-11 19:20:36

标签: sql postgresql

我有一个看起来像这样的表:

CREATE TABLE foobar (
                     id                     SERIAL PRIMARY KEY,
                     data_entry_date        DATE NOT NULL,
                     user_id                INTEGER NOT NULL,
                     wine_glasses_drunk     INTEGER NOT NULL,
                     whisky_shots_drunk     INTEGER NOT NULL,
                     beer_bottle_drunk      INTEGER NOT NULL
                 );

insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-01', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-02', 1, 4,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-03', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-04', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-05', 1, 2,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-07', 1, 1,2,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-08', 1, 4,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-11', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-12', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-13', 1, 2,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-14', 1, 1,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-15', 1, 9,3,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-16', 1, 0,4,2);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-17', 1, 0,5,3);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-18', 1, 2,2,5);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-20', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-23', 1, 1,3,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-24', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-01', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-02', 1, 2,3,4);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-05', 1, 1,2,2);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-09', 1, 0,0,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-10', 1, 1,1,1);
insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-11', 1, 3,6,3);

我想写一个查询,告诉我在一段时间内TOTAL wine_glasses_drunk,TOTAL whisky_shots_drunk和TOTAL beer_bottles_drunk与上一期间的TOTALs之间的区别。

听起来可能比现在复杂得多。如果我们使用的周期*为 == 7天,则查询应返回本周所消耗总数的差异,与消耗的总数相比较上周

表格中的日期并不连续 - 即有一些缺少日期,因此查询需要在确定期间计算的日期时找到最相关的日期。

This is what I have so far:

-- using hard coded dates

SELECT (SUM(f1.wine_glasses_drunk) - SUM(f2.wine_glasses_drunk)) as wine_diff, 
(SUM(f1.whisky_shots_drunk) - SUM(f2.whisky_shots_drunk)) as whisky_diff, 
(SUM(f1.beer_bottle_drunk) - SUM(f2.beer_bottle_drunk)) as beer_diff 
FROM foobar f1 INNER JOIN foobar f2 ON f2.user_id=f1.user_id
WHERE f1.user_id=1 
AND f1.data_entry_date BETWEEN '2011-01-08' AND '2011-01-15'
AND f2.data_entry_date BETWEEN '2011-01-01' AND '2011-01-08'
AND f1.data_entry_date - f2.data_entry_date between 6 and 9;

上面的SQL显然是一个hack(特别是f1.data_entry_date - f2.data_entry_date between 6 and 9标准)。我在excel中检查了结果,上面查询的结果是(毫不含糊)错误。

如何编写此查询 - 如何修改它以便它可以处理数据库中的非连续日期?

我正在使用postgreSQl,但如果可能的话,更愿意使用数据库不可知(即ANSI)SQL。

4 个答案:

答案 0 :(得分:2)

我不完全确定你所描述的描述是否正确,但我会使用两种不同的功能来获得你想要的结果。

首先,看一下date_trunc函数。这可以获得一周中第一天的日期,您可以对其进行分组以获得一周的总和。如果一周的第一天不是您想要的,您可以使用日期算法对其进行排序。我想这一周的第一天是星期一。

其次,您可以使用滞后窗函数来查找上一行的总和。请注意,如果您缺少一周,此函数将查看上一行而不是前一周。我已经检查了查询,以确保数据库正在查看正确的行。

select 
  user_id,
  week_start_date,
  this_week_wine_glasses_drunk -
    case when is_consecutive_weeks = 'TRUE' 
      then last_week_wine_glasses_drunk else 0 end as wine_glasses_drunk,
  this_week_whisky_shots_drunk -
    case when is_consecutive_weeks = 'TRUE' 
      then last_week_whisky_shots_drunk else 0 end as whisky_shots_drunk,
  this_week_beer_bottle_drunk -
    case when is_consecutive_weeks = 'TRUE' 
      then last_week_beer_bottle_drunk else 0 end as beer_bottle_drunk
from (
select
  user_id,
  week_start_date,
  this_week_wine_glasses_drunk,
  this_week_whisky_shots_drunk,
  this_week_beer_bottle_drunk,
  case when (lag(week_start_date)
    over (partition by user_id order by week_start_date)  + interval '7' day)
      = week_start_date then 'TRUE' end as is_consecutive_weeks,
  lag(this_week_wine_glasses_drunk) 
    over (partition by user_id order by week_start_date) as last_week_wine_glasses_drunk,
  lag(this_week_whisky_shots_drunk) 
    over (partition by user_id order by week_start_date) as last_week_whisky_shots_drunk,
  lag(this_week_beer_bottle_drunk) 
    over (partition by user_id order by week_start_date) as last_week_beer_bottle_drunk
from (
  select
    user_id,
    date_trunc('week', data_entry_date) as week_start_date,
    sum(wine_glasses_drunk) as this_week_wine_glasses_drunk,
    sum(whisky_shots_drunk) as this_week_whisky_shots_drunk,
    sum(beer_bottle_drunk) as this_week_beer_bottle_drunk
  from foobar
  group by user_id,
    date_trunc('week', data_entry_date)
  ) a
) b

A SQL fiddle is available供您查看。

顺便说一下,我来自Oracle背景,并使用PostgreSQL文档和SQL Fiddle破解了它。希望这就是你所需要的。

答案 1 :(得分:1)

略有不同的方法(我会让你填写日期参数。):

Declare @StartDate1, @EndDate1, @StartDate2, @EndDate2 AS Date
Set @StartDate1='6/1/2012'
Set @EndDate1='6/15/2012'
Set @StartDate2='6/16/2012'
Set @EndDate2='6/30/2012'

SELECT SUM(U.WineP1)-SUM(U.WineP2) AS WineDiff, SUM(U.WhiskeyP1)-SUM(U.WhiskeyP2) AS WhiskeyDiff, SUM(U.BeerP1)-SUM(U.BeerP2) AS BeerDiff
FROM
(
SELECT SUM(wine_glasses_drunk) AS WineP1, SUM(whisky_shots_drunk) AS WhiskeyP1, SUM(beer_bottle_drunk) AS BeerP1, 0 AS WineP2, 0 AS WhiskeyP2, 0 AS BeerP2
FROM foobar
WHERE data_entry_date BETWEEN @StartDate1 AND @EndDate1

UNION ALL

SELECT 0 AS WineP1, 0 AS WhiskeyP1, 0 AS BeerP1, SUM(wine_glasses_drunk) AS WineP2, SUM(whisky_shots_drunk) AS WhiskeyP2, SUM(beer_bottle_drunk) AS BeerP2
FROM foobar
WHERE data_entry_date BETWEEN @StartDate2 AND @EndDate2
) AS U

答案 2 :(得分:0)

作为开发这些查询的一般规则,请在peices中构建它,然后将它们组合起来。首先找到一个好的结构,然后分别构建你需要的所有peices,这样你就可以理解每个peice的工作原理。

在这里,我认为您需要使用更多子查询来找到明确的方法。我想你可以尝试这些方法:

计算所需的日期范围,并将它们保存为变量。 (您可能希望在日期中添加天数以查找下一个句点,而不是上面给出的代码。)

Declare @SQL1, @SQL2, @SQL3 as Date
Set @SQL1=(SQL1)
...

接下来,以使用日期作为参数的方式查找每周总计。

Select 
  sum(wine_glasses_drunk) as wine_totals, 
  sum(whiskey_shots_drunk) as whiskey_totals, 
  sum(beer_bottle_drunk) as beer_totals,
  case 
    when data_entry_date between @SQL1 and @SQL2 then 1
    when data_entry_date between @SQL2 and @SQL3 then 2
  end as period_number
from foobar

然后,围绕此构建您需要的摘要查询,因为数据的格式使其变得简单,并且您不需要多次使用相同值的这么多总和。

答案 3 :(得分:0)

我打算将此作为我的另一个答案的编辑添加,但它实际上是一种不同的方式,所以应该是一个单独的答案。

我认为我更喜欢我给出的其他答案,但即使数据存在差距,这个答案也应该有效。

要设置查询的参数,请更改with子句的period_start_date部分中period_daysquery_params的值。

with query_params as (
  select 
    date '2011-01-01' as period_start_date,
    7 as period_days
),
summary_data as (
select
  user_id,
  (data_entry_date - period_start_date)/period_days as period_number,
  sum(wine_glasses_drunk) as wine_glasses_drunk,
  sum(whisky_shots_drunk) as whisky_shots_drunk,
  sum(beer_bottle_drunk) as beer_bottle_drunk
from foobar
  cross join query_params
group by user_id,
  (data_entry_date - period_start_date)/period_days
)
select
  user_id,
  period_number,
  period_start_date + period_number * period_days as period_start_date,
  sum(wine_glasses_drunk) as wine_glasses_drunk,
  sum(whisky_shots_drunk) as whisky_shots_drunk,
  sum(beer_bottle_drunk) as beer_bottle_drunk
from (
  -- this weeks data
  select 
    user_id,
    period_number,
    wine_glasses_drunk,
    whisky_shots_drunk,
    beer_bottle_drunk
  from summary_data
  union all
  -- last weeks data
  select 
    user_id,
    period_number + 1 as period_number,
    -wine_glasses_drunk as wine_glasses_drunk,
    -whisky_shots_drunk as whisky_shots_drunk,
    -beer_bottle_drunk as beer_bottle_drunk
  from summary_data
) a
cross join query_params
where period_number <= (select max(period_number) from summary_data)
group by 
  user_id,
  period_number,
  period_start_date + period_number * period_days
order by 1, 2

同样,SQL Fiddle可用。