Question

我有一张这样的表：

Client  Branch  Amount  Date
1        2      1500    1.1.14
1        2      1400    3.1.14
1        3      1500    1.1.14
1        4      300     7.1.14
1        5      1500    1.1.14
------------------------------
2        2      300     1.1.14
2        2      300     1.1.14
2        5      300     1.1.14
2        3      400     4.1.14
------------------------------           
3        2      300     1.1.14
3        2      300     1.1.14
3        5      300     1.1.14
3        5      300     1.1.14
3        3      400     4.1.14
4        2      300     1.1.14  
4        2      300     1.1.14 
4        5      300     1.1.14 
4        5      300     1.1.14  
4        5      300     1.1.14

我想要的输出应该是这样的：

   Client   Branch  Amount  Date   Ind  Loan_Distinct_Num
    1        2      1500    1.1.14  0         1
    1        2      1400    3.1.14  0         2 
    1        3      1500    1.1.14  1         1
    1        4      300     7.1.14  0         3
    1        5      1500    1.1.14  1         1
    -------------------------------------------------
    2        2      300     1.1.14  0         1
    2        2      300     1.1.14  0         2
    2        5      300     1.1.14  1         2
    2        3      400     4.1.14  0         3
    --------------------------------------------------           
    3        2      300     1.1.14  0         1
    3        2      300     1.1.14  0         2 
    3        5      300     1.1.14  1         1
    3        5      300     1.1.14  1         2
    3        3      400     4.1.14  0         3
    ------------------------------------------------     
    4        2      300     1.1.14  0         1
    4        2      300     1.1.14  0         2
    4        5      300     1.1.14  1         1
    4        5      300     1.1.14  1         2
    4        5      300     1.1.14  0         3

那我该怎么办？（评论：这些记录只是一个样本数据）

嗯，这些是规则：客户已从同一银行的一个分支机构转移到另一个分支机构。问题是分支机构正在为他写几次数据。我想确定重复的贷款。需要两个步骤：

第一步：假设：Same_Amount + Same_Date + 不同日期---＆gt;在第一个原始之后的记录中，Ind = 1。

Ind字段如何运作？

例如：在client = 1的分区中，1500的金额在同一日期和不同的分支上重新获得3次，但只有这两个最后记录的详细信息将获得Ind的“1”值，第一个将获得Ind = 0，因为它不是重复贷款，这是第一次有数据和日期的记录出现在数据中。

如果客户端= 2，则branch = 2有两个记录，branch = 5只有一个记录，所以在这种情况下我会假设分支的最后一条记录= 2被重复。

如果像client = 3那样，branch = 2中有两条记录，branch就有两条记录= 5，所以在这种情况下，我将假设来自分支2的两笔贷款都被重复。

在客户端= 4时，它会像客户端3一样，但是有另一条记录，但我会认为它是一个新记录，因为我没有额外的过去贷款与她沟通。

步骤2：我想为每个客户创建我自己的不同贷款号码

有关如何处理解决此问题或简单问题的任何帮助？

评论：sql-server 2008。

Answer 1

首先 - 将数据设置为表格。我已经添加了一个标识列ID，因此我们可以按顺序排序 - 您在评论中指定您的数据是按特定顺序排列的。

declare @data table (ID int identity(1,1), Client int, Branch int, Amount int, [Date] date);
insert into @data values
(1,2,  1500,'2014-01-01'),
(1,2,  1400,'2014-03-01'),
(1,3,  1500,'2014-01-01'),
(1,4,  300,'2014-07-01'),
(1,5,  1500,'2014-01-01'),
(2,2,  300,'2014-01-01'),
(2,2,  300,'2014-01-01'),
(2,5,  300,'2014-01-01'),
(2,3,  400,'2014-04-01'),
(3,2,  300,'2014-01-01'),
(3,2,  300,'2014-01-01'),
(3,5,  300,'2014-01-01'),
(3,5,  300,'2014-01-01'),
(3,3,  400,'2014-04-01'),
(4,2,  300,'2014-01-01'),
(4,2,  300,'2014-01-01'),
(4,5,  300,'2014-01-01'),
(4,5,  300,'2014-01-01'),
(4,5,  300,'2014-01-01');

以下是我们进行查询的地方：

--In the first cte, we take all the data, and partition it up into individual loans (partition by Client, Amount, Date).
with cte1 as (
    select *, ROW_NUMBER() over (partition by Client, Amount, Date order by ID) as rowno from @data
), cte2 as (
    --in this cte, we get a list of distinct loans. We will use another rownumber in a bit to find our Loan_Distinct_Num
    select distinct Client, Amount, [Date] from @data
)
select cte1.Client, cte1.Branch, cte1.Amount, cte1.[Date]
      -- If rowno = 1, it's the first instance of that combination
    , case when rowno = 1 then 0 else 1 end as ind
    , b.Loan_Distinct_Num
 from cte1
 left join (select cte2.*, ROW_NUMBER() over (partition by Client order by [Date]) as Loan_Distinct_Num
             -- This is where our distinct loan number comes from
              from cte2 
              ) as b
              on b.Client = cte1.Client and b.Amount = cte1.Amount and b.[Date] = cte1.[Date]
 order by ID

Answer 2

如果ind应该只有1，如果存在具有不同分支＃的前一个记录，那么这是一个答案＃（见第7行）。另外，使用dense_rank按贷款额度/日期在loan_distinct_num中对贷款进行分组。对于该列，逻辑似乎更复杂 - 如果这是一次性修复，我可能会使用游标循环遍历表并应用一些更复杂的逻辑来填充该列，而不是尝试在查询中计算它。

-- sample data
declare @data table (ID int identity(1,1), Client int, Branch int, Amount int, [Date] date);
insert into @data values
(1,2,  1500,'2014-01-01'),
(1,2,  1400,'2014-03-01'),
(1,3,  1500,'2014-01-01'),
(1,4,  300,'2014-07-01'),
(1,5,  1500,'2014-01-01'),
(2,2,  300,'2014-01-01'),
(2,2,  300,'2014-01-01'),
(2,5,  300,'2014-01-01'),
(2,3,  400,'2014-04-01'),
(3,2,  300,'2014-01-01'),
(3,2,  300,'2014-01-01'),
(3,5,  300,'2014-01-01'),
(3,5,  300,'2014-01-01'),
(3,3,  400,'2014-04-01'),
(4,2,  300,'2014-01-01'),
(4,2,  300,'2014-01-01'),
(4,5,  300,'2014-01-01'),
(4,5,  300,'2014-01-01'),
(4,5,  300,'2014-01-01');

-- query
select client, branch, amount, date, 
    case when exists (select * from @data t2 where client = tbl.client and branch <> tbl.branch and amount = tbl.amount and date = tbl.date and id < tbl.id) then 1 else 0 end as ind,
    DENSE_RANK() over (partition by client order by date, amount asc) as loan_disinct_num
from @data tbl
order by id;

在sql中识别重复的原始（贷款）

2 个答案: