如何在SQL中连接时限制行的上限?

时间:2017-06-29 18:07:59

标签: sql join greenplum

我有两张桌子:余额和日历。

余额:

Account Date        Balance 
1111    01/01/2014  100 
1111    02/01/2014  156 
1111    03/01/2014  300 
1111    04/01/2014  300 
1111    07/01/2014  468

1112    02/01/2014  300
1112    03/01/2014  300
1112    06/01/2014  300
1112    07/01/2014  350
1112    08/01/2014  400
1112    09/01/2014  450

1113    01/01/2014  30
1113    02/01/2014  40
1113    03/01/2014  45
1113    06/01/2014  45
1113    07/01/2014  60
1113    08/01/2014  50
1113    09/01/2014  20
1113    10/01/2014  10

日历

date        business_day_ind
01/01/2014  N
02/01/2014  Y
03/01/2014  Y
04/01/2014  N
05/01/2014  N
06/01/2014  Y
07/01/2014  Y
08/01/2014  Y
09/01/2014  Y
10/01/2014  Y

我需要做以下事情:

  • 我需要填写所有帐户的缺失日期,直到它有价值的最大日期。比如账号1111,它只有到2014年1月7日才有价值,所以日期需要填写,直到那个。但是当我加入日历表(普通左连接)时,我无法将最大日期限制为可用于帐户的那一天。
1111   01/01/2014  100 N 
1111   02/01/2014  156 Y 
1111   03/01/2014  300 Y
1111   04/01/2014  300 Y
1111   05/01/2014      N
1111   06/01/2014      N
1111   07/01/2014  468 Y
1111   08/01/2014      Y
1111   09/01/2014      Y
1111   10/01/2014      Y

1112   01/01/2014      N
1112   02/01/2014 300  Y
1112   03/01/2014 300  Y
1112   04/01/2014      N
1112   05/01/2014      N
1112   06/01/2014 300  Y
1112   07/01/2014 350  Y
1112   08/01/2014 400  Y
1112   09/01/2014 450  Y
1112   10/01/2014      Y

我需要一种有效的方式(最好不涉及多个步骤)将日期限制在帐户的最长余额可用日期(2014年7月1日,如果是1111,09 / 01/2014)如果是1112)

期望的输出:

1111   01/01/2014  100 N 
1111   02/01/2014  156 Y 
1111   03/01/2014  300 Y
1111   04/01/2014  300 Y
1111   05/01/2014      N
1111   06/01/2014      N
1111   07/01/2014  468 Y

1112   01/01/2014      N
1112   02/01/2014 300  Y
1112   03/01/2014 300  Y
1112   04/01/2014      N
1112   05/01/2014      N
1112   06/01/2014 300  Y
1112   07/01/2014 350  Y
1112   08/01/2014 400  Y
1112   09/01/2014 450  Y

在填写缺失的日子之后,我计划将前一个工作日的余额归咎于缺失的日子。我计划在每个日期前一个工作日,并通过将原始余额表与acct和上一个工作日作为关键字来更新缺失的行。

感谢。

我是Greenplum数据库。

3 个答案:

答案 0 :(得分:0)

可能的方法是在子查询中放入第二个选择。例如:

select ... from   calendar  a  left outer join balance b on a.date  = b.date 
where a.date <= (select max(date) from balance c where b.Account = c.Account )      

答案 1 :(得分:0)

我想你有第三张桌子accounts

select
  accounts.account,
  calendar.date,
  balance.balance,
  calendar.business_day_ind
from
  accounts cross join lateral (
    select * 
    from calendar
    where calendar.date <= (
      select max(date)
      from balance
      where balance.account = accounts.account)) as calendar left join
  balance on (balance.account = accounts.account and balance.date = calendar.date)
order by
  accounts.account, calendar.date;

About lateral joins

答案 2 :(得分:0)

这是一个有趣的挑战!

CREATE TABLE balance 
(account int, balance_date timestamp, balance int)
DISTRIBUTED BY (account, balance_date);

INSERT INTO balance 
values (1111,'01/01/2014', 100),
(1111, '02/01/2014', 156),
(1111, '03/01/2014', 300), 
(1111, '04/01/2014', 300),
(1111, '07/01/2014', 468),
(1112, '02/01/2014', 300),
(1112, '03/01/2014', 300),
(1112, '06/01/2014', 300),
(1112, '07/01/2014', 350),
(1112, '08/01/2014', 400),
(1112, '09/01/2014', 450),
(1113, '01/01/2014', 30),
(1113, '02/01/2014', 40),
(1113, '03/01/2014', 45),
(1113, '06/01/2014', 45),
(1113, '07/01/2014', 60),
(1113, '08/01/2014', 50),
(1113, '09/01/2014', 20),
(1113, '10/01/2014', 10);

CREATE TABLE calendar
(calendar_date timestamp, business_day_ind boolean)
DISTRIBUTED BY (calendar_date);

INSERT INTO calendar
values ('01/01/2014', false),
('02/01/2014', true),
('03/01/2014', true),
('04/01/2014', false),
('05/01/2014', false),
('06/01/2014', true),
('07/01/2014', true),
('08/01/2014', true),
('09/01/2014', true),
('10/01/2014', true);

analyze balance;
analyze calendar;

现在是查询。

select d.account, d.my_date, b.balance, c.business_day_ind
from    (
    select account, start_date + interval '1 month' * (generate_series(0, duration)) AS my_date
    from    (
        select account, start_date, (date_part('year', duration) * 12 + date_part('month', duration))::int as duration
        from    (
            select start_date, age(end_date, start_date) as duration, account
            from    (
                select account, min(balance_date) as start_date, max(balance_date) as end_date
                from balance
                group by account
                ) as sub1
            ) as sub2
        ) sub3
    ) as d
left outer join balance b on d.account = b.account and d.my_date = b.balance_date
join calendar c on c.calendar_date = d.my_date
order by d.account, d.my_date;

结果:

 account |       my_date       | balance | business_day_ind 
---------+---------------------+---------+------------------
    1111 | 2014-01-01 00:00:00 |     100 | f
    1111 | 2014-02-01 00:00:00 |     156 | t
    1111 | 2014-03-01 00:00:00 |     300 | t
    1111 | 2014-04-01 00:00:00 |     300 | f
    1111 | 2014-05-01 00:00:00 |         | f
    1111 | 2014-06-01 00:00:00 |         | t
    1111 | 2014-07-01 00:00:00 |     468 | t
    1112 | 2014-02-01 00:00:00 |     300 | t
    1112 | 2014-03-01 00:00:00 |     300 | t
    1112 | 2014-04-01 00:00:00 |         | f
    1112 | 2014-05-01 00:00:00 |         | f
    1112 | 2014-06-01 00:00:00 |     300 | t
    1112 | 2014-07-01 00:00:00 |     350 | t
    1112 | 2014-08-01 00:00:00 |     400 | t
    1112 | 2014-09-01 00:00:00 |     450 | t
    1113 | 2014-01-01 00:00:00 |      30 | f
    1113 | 2014-02-01 00:00:00 |      40 | t
    1113 | 2014-03-01 00:00:00 |      45 | t
    1113 | 2014-04-01 00:00:00 |         | f
    1113 | 2014-05-01 00:00:00 |         | f
    1113 | 2014-06-01 00:00:00 |      45 | t
    1113 | 2014-07-01 00:00:00 |      60 | t
    1113 | 2014-08-01 00:00:00 |      50 | t
    1113 | 2014-09-01 00:00:00 |      20 | t
    1113 | 2014-10-01 00:00:00 |      10 | t
(25 rows)

我必须获取每个帐户的最短和最长日期,然后使用generate_series生成两个日期之间的月份。如果你想要每天的记录,但我必须使用另一个子查询来获得每月的结果,那将是一个更清晰的查询。