想象一下以下Loans
表:
BorrowerID StartDate DueDate
=============================================
1 2012-09-02 2012-10-01
2 2012-10-05 2012-10-21
3 2012-11-07 2012-11-09
4 2012-12-01 2013-01-01
4 2012-12-01 2013-01-14
1 2012-12-20 2013-01-06
3 2013-01-07 2013-01-22
3 2013-01-15 2013-01-18
1 2013-02-20 2013-02-24
我如何选择那些一次只获得一笔贷款的人BorrowerID
?这包括那些只购买过一笔贷款的借款人,以及那些已经拿出一笔以上贷款的借款人,只要您提取贷款的时间表,其中没有一笔会重叠。例如,在上表中,它应该只找到借款人1和。
我已经尝试过将表格加入到自身中,但实际上还没有成功。任何指针都非常赞赏!
答案 0 :(得分:9)
要解决此问题,您需要采用两步法,详见以下SQL Fiddle。我确实在示例数据中添加了一个LoanId列,查询要求存在这样一个唯一的id。如果您没有,则需要调整join子句以确保贷款与自身不匹配。
MS SQL Server 2008架构设置:
CREATE TABLE dbo.Loans
(LoanID INT, [BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
GO
INSERT INTO dbo.Loans
(LoanID, [BorrowerID], [StartDate], [DueDate])
VALUES
(1, 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
(2, 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
(3, 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
(4, 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
(5, 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
(6, 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
(7, 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
(8, 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
(9, 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
GO
首先,你需要找出哪些贷款与另一笔贷款重叠。该查询使用<=
来比较开始日期和截止日期。这计算贷款,其中第二个开始于同一天,第一个开始重叠。如果您需要重叠,请在两个地方使用<
。
查询1 :
SELECT
*,
CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2
WHERE L2.BorrowerID = L1.BorrowerID
AND L2.LoanID <> L1.LoanID
AND L1.StartDate <= L2.DueDate
AND L2.StartDate <= l1.DueDate)
THEN 1
ELSE 0
END AS HasOverlappingLoan
FROM dbo.Loans L1;
<强> Results 强>:
| LOANID | BORROWERID | STARTDATE | DUEDATE | HASOVERLAPPINGLOAN |
|--------|------------|----------------------------------|---------------------------------|--------------------|
| 1 | 1 | September, 02 2012 00:00:00+0000 | October, 01 2012 00:00:00+0000 | 0 |
| 2 | 2 | October, 05 2012 00:00:00+0000 | October, 21 2012 00:00:00+0000 | 0 |
| 3 | 3 | November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 | 0 |
| 4 | 4 | December, 01 2012 00:00:00+0000 | January, 01 2013 00:00:00+0000 | 1 |
| 5 | 4 | December, 01 2012 00:00:00+0000 | January, 14 2013 00:00:00+0000 | 1 |
| 6 | 1 | December, 20 2012 00:00:00+0000 | January, 06 2013 00:00:00+0000 | 0 |
| 7 | 3 | January, 07 2013 00:00:00+0000 | January, 22 2013 00:00:00+0000 | 1 |
| 8 | 3 | January, 15 2013 00:00:00+0000 | January, 18 2013 00:00:00+0000 | 1 |
| 9 | 1 | February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 | 0 |
现在,通过该信息,您可以确定此查询中没有重叠贷款的借款人:
查询2 :
WITH OverlappingLoans AS (
SELECT
*,
CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2
WHERE L2.BorrowerID = L1.BorrowerID
AND L2.LoanID <> L1.LoanID
AND L1.StartDate <= L2.DueDate
AND L2.StartDate <= l1.DueDate)
THEN 1
ELSE 0
END AS HasOverlappingLoan
FROM dbo.Loans L1
),
OverlappingBorrower AS (
SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
FROM OverlappingLoans
GROUP BY BorrowerID
)
SELECT *
FROM OverlappingBorrower
WHERE hasOverlappingLoan = 0;
或者您甚至可以通过计算贷款以及计算数据库中每个借款人与其他贷款重叠的贷款数量来获取更多信息。 (注意,如果贷款A和贷款B重叠,则此查询都将计为重叠贷款)
<强> Results 强>:
| BORROWERID | HASOVERLAPPINGLOAN |
|------------|--------------------|
| 1 | 0 |
| 2 | 0 |
查询3 :
WITH OverlappingLoans AS (
SELECT
*,
CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2
WHERE L2.BorrowerID = L1.BorrowerID
AND L2.LoanID <> L1.LoanID
AND L1.StartDate <= L2.DueDate
AND L2.StartDate <= l1.DueDate)
THEN 1
ELSE 0
END AS HasOverlappingLoan
FROM dbo.Loans L1
)
SELECT BorrowerID,COUNT(1) LoanCount, SUM(hasOverlappingLoan) OverlappingCount
FROM OverlappingLoans
GROUP BY BorrowerID;
<强> Results 强>:
| BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
|------------|-----------|------------------|
| 1 | 3 | 0 |
| 2 | 1 | 0 |
| 3 | 3 | 2 |
| 4 | 2 | 2 |
更新:由于要求实际上要求的解决方案不依赖于每笔贷款的唯一标识符,因此我做了以下更改:
1)我添加了一个借款人,其中有两笔贷款具有相同的开始和截止日期
MS SQL Server 2008架构设置:
CREATE TABLE dbo.Loans
([BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
GO
INSERT INTO dbo.Loans
([BorrowerID], [StartDate], [DueDate])
VALUES
( 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
( 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
( 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
( 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
( 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
( 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
( 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
( 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
( 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
GO
2)这些“平等日期”贷款需要额外的步骤:
查询1 :
SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
FROM dbo.Loans
GROUP BY BorrowerID, StartDate, DueDate;
<强> Results 强>:
| BORROWERID | STARTDATE | DUEDATE | LOANCOUNT |
|------------|----------------------------------|---------------------------------|-----------|
| 1 | September, 02 2012 00:00:00+0000 | October, 01 2012 00:00:00+0000 | 1 |
| 1 | December, 20 2012 00:00:00+0000 | January, 06 2013 00:00:00+0000 | 1 |
| 1 | February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 | 1 |
| 2 | October, 05 2012 00:00:00+0000 | October, 21 2012 00:00:00+0000 | 1 |
| 3 | November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 | 1 |
| 3 | January, 07 2013 00:00:00+0000 | January, 22 2013 00:00:00+0000 | 1 |
| 3 | January, 15 2013 00:00:00+0000 | January, 18 2013 00:00:00+0000 | 1 |
| 4 | December, 01 2012 00:00:00+0000 | January, 01 2013 00:00:00+0000 | 1 |
| 4 | December, 01 2012 00:00:00+0000 | January, 14 2013 00:00:00+0000 | 1 |
| 5 | February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 | 2 |
3)现在,每个贷款范围都是唯一的,我们可以再次使用旧技术。但是,我们还需要考虑那些“同等日期”的贷款。 (L1.StartDate <> L2.StartDate OR L1.DueDate <> L2.DueDate)
可以防止贷款与自身匹配。 OR LoanCount > 1
说明了“平等日期”贷款。
查询2 :
WITH NormalizedLoans AS (
SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
FROM dbo.Loans
GROUP BY BorrowerID, StartDate, DueDate
)
SELECT
*,
CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2
WHERE L2.BorrowerID = L1.BorrowerID
AND L1.StartDate <= L2.DueDate
AND L2.StartDate <= l1.DueDate
AND (L1.StartDate <> L2.StartDate
OR L1.DueDate <> L2.DueDate)
)
OR LoanCount > 1
THEN 1
ELSE 0
END AS HasOverlappingLoan
FROM NormalizedLoans L1;
<强> Results 强>:
| BORROWERID | STARTDATE | DUEDATE | LOANCOUNT | HASOVERLAPPINGLOAN |
|------------|----------------------------------|---------------------------------|-----------|--------------------|
| 1 | September, 02 2012 00:00:00+0000 | October, 01 2012 00:00:00+0000 | 1 | 0 |
| 1 | December, 20 2012 00:00:00+0000 | January, 06 2013 00:00:00+0000 | 1 | 0 |
| 1 | February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 | 1 | 0 |
| 2 | October, 05 2012 00:00:00+0000 | October, 21 2012 00:00:00+0000 | 1 | 0 |
| 3 | November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 | 1 | 0 |
| 3 | January, 07 2013 00:00:00+0000 | January, 22 2013 00:00:00+0000 | 1 | 1 |
| 3 | January, 15 2013 00:00:00+0000 | January, 18 2013 00:00:00+0000 | 1 | 1 |
| 4 | December, 01 2012 00:00:00+0000 | January, 01 2013 00:00:00+0000 | 1 | 1 |
| 4 | December, 01 2012 00:00:00+0000 | January, 14 2013 00:00:00+0000 | 1 | 1 |
| 5 | February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 | 2 | 1 |
此查询逻辑没有改变(除了切换开头)。
查询3 :
WITH NormalizedLoans AS (
SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
FROM dbo.Loans
GROUP BY BorrowerID, StartDate, DueDate
),
OverlappingLoans AS (
SELECT
*,
CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2
WHERE L2.BorrowerID = L1.BorrowerID
AND L1.StartDate <= L2.DueDate
AND L2.StartDate <= l1.DueDate
AND (L1.StartDate <> L2.StartDate
OR L1.DueDate <> L2.DueDate)
)
OR LoanCount > 1
THEN 1
ELSE 0
END AS HasOverlappingLoan
FROM NormalizedLoans L1
),
OverlappingBorrower AS (
SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
FROM OverlappingLoans
GROUP BY BorrowerID
)
SELECT *
FROM OverlappingBorrower
WHERE hasOverlappingLoan = 0;
<强> Results 强>:
| BORROWERID | HASOVERLAPPINGLOAN |
|------------|--------------------|
| 1 | 0 |
| 2 | 0 |
4)在此计算查询中,我们需要再次纳入“相等日期”贷款计数。为此,我们使用SUM(LoanCount)
而不是普通COUNT
。我们还必须将hasOverlappingLoan
乘以LoanCount以再次获得正确的重叠计数。
查询4 :
WITH NormalizedLoans AS (
SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
FROM dbo.Loans
GROUP BY BorrowerID, StartDate, DueDate
),
OverlappingLoans AS (
SELECT
*,
CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2
WHERE L2.BorrowerID = L1.BorrowerID
AND L1.StartDate <= L2.DueDate
AND L2.StartDate <= l1.DueDate
AND (L1.StartDate <> L2.StartDate
OR L1.DueDate <> L2.DueDate)
)
OR LoanCount > 1
THEN 1
ELSE 0
END AS HasOverlappingLoan
FROM NormalizedLoans L1
)
SELECT BorrowerID,SUM(LoanCount) LoanCount, SUM(hasOverlappingLoan*LoanCount) OverlappingCount
FROM OverlappingLoans
GROUP BY BorrowerID;
<强> Results 强>:
| BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
|------------|-----------|------------------|
| 1 | 3 | 0 |
| 2 | 1 | 0 |
| 3 | 3 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
我强烈建议找到一种方法来使用我的第一个解决方案,因为没有主键的贷款表是一个,让我们说“奇怪”的设计。但是,如果你真的无法到达那里,请使用第二种解决方案。
答案 1 :(得分:1)
我得到了它的工作,但有点复杂的方式。它首先获得不符合内部查询条件的借款人,并返回其余的。内部查询有两部分:
让所有重叠的借款不在同一天开始。
从同一天开始获得所有借款。
select distinct BorrowerID from borrowings
where BorrowerID NOT IN
(
select b1.BorrowerID from borrowings b1
inner join borrowings b2
on b1.BorrowerID = b2.BorrowerID
and b1.StartDate < b2.StartDate
and b1.DueDate > b2.StartDate
union
select BorrowerID from borrowings
group by BorrowerID, StartDate
having count(*) > 1
)
我必须使用2个单独的内部查询,因为您的表没有每个记录的唯一标识符并使用b1.StartDate <= b2.StartDate
,因为我应该将记录连接到自身。为每条记录设置一个单独的标识符会很好。
答案 2 :(得分:0)
试
with cte as
(
select *,
row_number() over (partition by b order by s) r
from loans
)
select l1.b
from loans l1
except
select c1.b
from cte c1
where exists (
select 1
from cte c2
where c2.b = c1.b
and c2.r <> c1.r
and (c2.s between c1.s and c1.e
or c1.s between c2.s and c2.e)
)
答案 3 :(得分:0)
如果您使用的是SQL 2012,可以这样做:
with cte as (
select
BorrowerID,
StartDate,
DueDate,
lag(DueDate) over (partition by borrowerid order by StartDate, DueDate) as PrevDueDate
from test
)
select
distinct BorrowerID
from cte
where BorrowerID not in
(select BorrowerID
from cte
where StartDate <= PrevDueDate)