基于字符串和null计数

时间:2016-10-04 12:23:58

标签: sql sql-server sql-server-2008 gaps-and-islands

以下是我的示例数据:

id     FirstName      LastName     HouseNo     MyCount
1      A                  C          1-1         2
2      B                  C          1-1         2
4      D                  A                      3
5      F                  A                      3
6      J                  A                      3
7      Q                  X          1-2         3
8      D                  X          1-2         3
9      D                  X          1-2         3
10     A                  C          1-3         3
11     B                  C          1-3         3
12     C                  C          1-3         3
14     F                  K                      2
15     J                  K                      2
16     Q                  X          1-5         1

根据以上数据,我想计算具有相同HouseNoLastName的记录数。

为此,我正在使用

SELECT COUNT(ID) AS _COUNT FROM MYTABLE GROUP BY LASTNAME, HOUSENO

但上述声明有一个问题。在数据中,一些记录没有HouseNo。在上面的示例中,ID 4,5,6和14,15没有HouseNo。所以,上面的语句返回5但它应该分别返回3和2。

主要目标

  1. 根据LastNameHouseNo
  2. 计算
  3. 记录那些没有HouseNo的记录(它们会串联出来)。
  4. 即将到来的计数应在MyCount
  5. 中更新

    我如何计算这个数字?

    编辑赏金:

    示例数据

    id  FirstName   LastName    HouseNo     MyCount     CountId
    1   Imran       Khan        1-1         
    2   Waseem      Khan        1-1         
    3   Rihan       Khan        1-1         
    4   Moiz        Shaikh      1-2         
    5   Zbair       Shaikh      1-2         
    6   Sultan      Shaikh      1-2         
    7   Zaid        Khan                    
    10  Parvez      Patel       1-3         
    11  Ahmed       Patel       1-3         
    12  Rahat       Syed        1-4         
    13  Talha       Khan                    
    14  Zia         Khan                    
    15  Arshad      Patel       1-3         
    16  Samad       Patel       1-3         
    17  Raees       Syed        1-4         
    18  Azmat       Khan                    
    19  Imran       Khan                    
    

    预期结果:

    id  FirstName   LastName    HouseNo     MyCount     CountId
    1   Imran       Khan        1-1         3           1
    2   Waseem      Khan        1-1         3           1
    3   Rihan       Khan        1-1         3           1
    4   Moiz        Shaikh      1-2         3           2
    5   Zbair       Shaikh      1-2         3           2
    6   Sultan      Shaikh      1-2         3           2
    7   Zaid        Khan                    1           3
    10  Parvez      Patel       1-3         2           4   
    11  Ahmed       Patel       1-3         2           4
    12  Rahat       Syed        1-4         1           5   
    13  Talha       Khan                    2           6
    14  Zia         Khan                    2           6   
    15  Arshad      Patel       1-3         2           7   
    16  Samad       Patel       1-3         2           7
    17  Raees       Syed        1-4         1           8   
    18  Azmat       Khan                    2           9
    19  Imran       Khan                    2           9   
    
    1. 示例数据MyCountCountId为空,应填写。
    2. MyCount将基于HouseNoLastName,请参阅ID 1至3,其姓氏为khan,房屋号为1-1,因此MyCount为ID 1到3将为3,CountId将为1.
    3. 在示例数据中,有许多记录没有HouseNo,因此在这种情况下,系列中的同名姓氏将被计算在内。请参阅ID 7,其计数为1.另见ID 18和19,其计数为2。
    4. CountId是一个ID号的序列号。请参阅ID 1至3,由于同一房屋号和姓氏相同,因此为1。

8 个答案:

答案 0 :(得分:4)

看起来主要的混淆是由问题开头的SQL语句引起的,只是GROUP BY LASTNAME, HOUSENO

如果您想要一个简单的分组,您的查询将是正确的。但是,然后您向我们展示了具有预期结果的更详细的示例数据,并且很明显您不仅需要分组(不关心数据中行的顺序),而是希望基于它们对行进行分组序列。

这是一个名为gaps-and-islands的经典问题。在SQL Server 2008中,可以使用少量调用ROW_NUMBER函数来完成。

示例数据

DECLARE @T TABLE 
    (id int PRIMARY KEY
    ,FirstName nvarchar(50)
    ,LastName nvarchar(50)
    ,HouseNo nvarchar(50)
    ,MyCount int
    ,CountId int);

INSERT INTO @T (id, FirstName, LastName, HouseNo) VALUES
(1 , 'Imran ', 'Khan  ', '1-1'),
(2 , 'Waseem', 'Khan  ', '1-1'),
(3 , 'Rihan ', 'Khan  ', '1-1'),
(4 , 'Moiz  ', 'Shaikh', '1-2'),
(5 , 'Zbair ', 'Shaikh', '1-2'),
(6 , 'Sultan', 'Shaikh', '1-2'),
(7 , 'Zaid  ', 'Khan  ',  NULL),
(10, 'Parvez', 'Patel ', '1-3'),
(11, 'Ahmed ', 'Patel ', '1-3'),
(12, 'Rahat ', 'Syed  ', '1-4'),
(13, 'Talha ', 'Khan  ',  NULL),
(14, 'Zia   ', 'Khan  ',  NULL),
(15, 'Arshad', 'Patel ', '1-3'),
(16, 'Samad ', 'Patel ', '1-3'),
(17, 'Raees ', 'Syed  ', '1-4'),
(18, 'Azmat ', 'Khan  ',  NULL),
(19, 'Imran ', 'Khan  ',  NULL);

SELECT查询

WITH
CTE_RN
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,ROW_NUMBER() OVER (PARTITION BY LastName, HouseNo ORDER BY ID) AS rn1
        ,ROW_NUMBER() OVER (ORDER BY ID) AS rn2
    FROM @T AS T
)
,CTE_GRoups
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,rn1
        ,rn2
        ,rn2-rn1 AS GroupNumber
        ,COUNT(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS NewMyCount
        ,MIN(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS GroupMinID
    FROM CTE_RN
)
SELECT
    id
    ,FirstName
    ,LastName
    ,HouseNo
    ,rn1
    ,rn2
    ,GroupNumber
    ,NewMyCount
    ,GroupMinID
    ,DENSE_RANK() OVER (ORDER BY GroupMinID) AS NewCountId
FROM CTE_GRoups
ORDER BY ID;

<强>结果

+----+-----------+----------+---------+-----+-----+-------------+------------+------------+------------+
| id | FirstName | LastName | HouseNo | rn1 | rn2 | GroupNumber | NewMyCount | GroupMinID | NewCountId |
+----+-----------+----------+---------+-----+-----+-------------+------------+------------+------------+
|  1 | Imran     | Khan     | 1-1     |   1 |   1 |           0 |          3 |          1 |          1 |
|  2 | Waseem    | Khan     | 1-1     |   2 |   2 |           0 |          3 |          1 |          1 |
|  3 | Rihan     | Khan     | 1-1     |   3 |   3 |           0 |          3 |          1 |          1 |
|  4 | Moiz      | Shaikh   | 1-2     |   1 |   4 |           3 |          3 |          4 |          2 |
|  5 | Zbair     | Shaikh   | 1-2     |   2 |   5 |           3 |          3 |          4 |          2 |
|  6 | Sultan    | Shaikh   | 1-2     |   3 |   6 |           3 |          3 |          4 |          2 |
|  7 | Zaid      | Khan     | NULL    |   1 |   7 |           6 |          1 |          7 |          3 |
| 10 | Parvez    | Patel    | 1-3     |   1 |   8 |           7 |          2 |         10 |          4 |
| 11 | Ahmed     | Patel    | 1-3     |   2 |   9 |           7 |          2 |         10 |          4 |
| 12 | Rahat     | Syed     | 1-4     |   1 |  10 |           9 |          1 |         12 |          5 |
| 13 | Talha     | Khan     | NULL    |   2 |  11 |           9 |          2 |         13 |          6 |
| 14 | Zia       | Khan     | NULL    |   3 |  12 |           9 |          2 |         13 |          6 |
| 15 | Arshad    | Patel    | 1-3     |   3 |  13 |          10 |          2 |         15 |          7 |
| 16 | Samad     | Patel    | 1-3     |   4 |  14 |          10 |          2 |         15 |          7 |
| 17 | Raees     | Syed     | 1-4     |   2 |  15 |          13 |          1 |         17 |          8 |
| 18 | Azmat     | Khan     | NULL    |   4 |  16 |          12 |          2 |         18 |          9 |
| 19 | Imran     | Khan     | NULL    |   5 |  17 |          12 |          2 |         18 |          9 |
+----+-----------+----------+---------+-----+-----+-------------+------------+------------+------------+

这里我在结果中包含了所有中间步骤,因此您可以看到它是如何工作的。主要部分是两组ROW_NUMBER s。 rn1序列将针对每个LastName, HouseNo重新启动。它由LastName, HouseNo分区。 rn2是一个简单的增加序列,没有间隙。我们需要它,因为原始ID定义了顺序,但可能有差距。

然后我们减去这两个序列,差异给我们GroupNumber

计算组中元素的数量很简单COUNT,这会给我们NewMyCount

使用无间隙的序列号枚举组分两步完成。首先MIN为组提供标识符,然后DENSE_RANK生成一系列NewCountId无间隙。

如果您想要使用计算出的NewMyCountNewCountId实际更新原始表格,可以轻松将上面的SELECT查询转换为UPDATE查询:

更新查询

WITH
CTE_RN
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,ROW_NUMBER() OVER (PARTITION BY LastName, HouseNo ORDER BY ID) AS rn1
        ,ROW_NUMBER() OVER (ORDER BY ID) AS rn2
    FROM @T AS T
)
,CTE_GRoups
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,rn1
        ,rn2
        ,rn2-rn1 AS GroupNumber
        ,COUNT(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS NewMyCount
        ,MIN(ID) OVER (PARTITION BY LastName, HouseNo, rn2-rn1) AS GroupMinID
    FROM CTE_RN
)
,CTE_Update
AS
(
    SELECT
        id
        ,FirstName
        ,LastName
        ,HouseNo
        ,MyCount
        ,CountId
        ,rn1
        ,rn2
        ,GroupNumber
        ,NewMyCount
        ,GroupMinID
        ,DENSE_RANK() OVER (ORDER BY GroupMinID) AS NewCountId
    FROM CTE_GRoups
)
UPDATE CTE_Update
SET
    MyCount = NewMyCount
    ,CountId = NewCountId
;

<强>结果

SELECT *
FROM @T
ORDER BY ID;

+----+-----------+----------+---------+---------+---------+
| id | FirstName | LastName | HouseNo | MyCount | CountId |
+----+-----------+----------+---------+---------+---------+
|  1 | Imran     | Khan     | 1-1     |       3 |       1 |
|  2 | Waseem    | Khan     | 1-1     |       3 |       1 |
|  3 | Rihan     | Khan     | 1-1     |       3 |       1 |
|  4 | Moiz      | Shaikh   | 1-2     |       3 |       2 |
|  5 | Zbair     | Shaikh   | 1-2     |       3 |       2 |
|  6 | Sultan    | Shaikh   | 1-2     |       3 |       2 |
|  7 | Zaid      | Khan     | NULL    |       1 |       3 |
| 10 | Parvez    | Patel    | 1-3     |       2 |       4 |
| 11 | Ahmed     | Patel    | 1-3     |       2 |       4 |
| 12 | Rahat     | Syed     | 1-4     |       1 |       5 |
| 13 | Talha     | Khan     | NULL    |       2 |       6 |
| 14 | Zia       | Khan     | NULL    |       2 |       6 |
| 15 | Arshad    | Patel    | 1-3     |       2 |       7 |
| 16 | Samad     | Patel    | 1-3     |       2 |       7 |
| 17 | Raees     | Syed     | 1-4     |       1 |       8 |
| 18 | Azmat     | Khan     | NULL    |       2 |       9 |
| 19 | Imran     | Khan     | NULL    |       2 |       9 |
+----+-----------+----------+---------+---------+---------+

答案 1 :(得分:3)

使用CTE然后更新您的表格如下:

;WITH T AS
(
    SELECT
        *,      
        ROW_NUMBER() OVER (ORDER BY ID) AS SrNo,
        ROW_NUMBER() OVER (PARTITION BY LastName,HouseNo ORDER BY HouseNo) AS PartNo        
    FROM MYTABLE
),
X as
 (
    SELECT 
        T.LastName,
        T.HouseNo,  
        (MAX(T.ID)-MIN(T.ID))+1 AS NoOfCount,        
         ROW_NUMBER() OVER(Order BY  MAX(ID)) AS RowNo,
         MAX(ID) AS ID       
    FROM T
    GROUP BY T.LastName,T.HouseNo, (T.SrNo - T.PartNo)      
)

Update MYTABLE
SET 
    MyCount=X.NoOfCount,
    CountId=X.RowNo
FROM X
WHERE MYTABLE.LastName=X.LastName 
AND MYTABLE.HouseNo=X.HouseNo 
AND MYTABLE.ID<=X.ID

SELECT * FROM MYTABLE

<强>输出:

enter image description here

答案 2 :(得分:2)

这应该这样做

declare @temp table (id int, firstname varchar(5), lastname varchar(5), houseno varchar(5), mycount int)

insert into @temp values(1,   'A',  'C',  '1-1',  2)
insert into @temp values(2,   'B',  'C',  '1-1',  2)
insert into @temp values(4,   'D',  'A',   null,  3)
insert into @temp values(5,   'F',  'A',   null,  3)
insert into @temp values(6,   'J',  'A',   null,  3)
insert into @temp values(7,   'Q',  'X',  '1-2',  3)
insert into @temp values(8,   'D',  'X',  '1-2',  3)
insert into @temp values(9,   'D',  'X',  '1-2',  3)
insert into @temp values(10,  'A',  'C',  '1-3',  3)
insert into @temp values(11,  'B',  'C',  '1-3',  3)
insert into @temp values(12,  'C',  'C',  '1-3',  3)
insert into @temp values(14,  'F',  'K',   null,  2)
insert into @temp values(15,  'J',  'K',   null,  2)
insert into @temp values(16,  'Q',  'X',  '1-5',  1)  

select count(ID) as _count 
from @temp
group by isnull(lastname, ''), isnull(houseno, '') 

返回

_count
   3    
   2    
   2    
   3    
   3    
   1    

你可以用这个来吐出更多细节:

select distinct
       t.lastname, 
       isnull(t.houseno, '') as houseno,
       (select count(ID) from @temp t2 where t2.lastname = t.lastname and t2.houseno = t.houseno) as _count_filled,
       (select count(ID) from @temp t2 where t2.lastname = t.lastname and isnull(t2.houseno, '') = isnull(t.houseno, '') and t2.houseno is null) as _count_empty
from   @temp t

它会返回:

lastname    houseno _count_filled   _count_empty    
A                   0               3   
C           1-1     2               0   
C           1-3     3               0   
K                   0               2   
X           1-2     3               0   
X           1-5     1               0   

答案 3 :(得分:2)

首先创建一个视图来计算每个部分的计数和排名。

CREATE  VIEW cnt  AS 
SELECT
    T.LastName,
    T.HouseNo,  MIN(t.id)  AS START , MAX(T.id) AS finish , 
    (MAX(T.ID)-MIN(T.ID))+1 AS NoOfCount,        
     ROW_NUMBER() OVER(Order BY  MAX(T.ID)) AS RowNo,
     MAX(T.ID) AS ID       
FROM (
SELECT
    *,      
    ROW_NUMBER() OVER (ORDER BY ID) AS SrNo,
    ROW_NUMBER() OVER (PARTITION BY LastName,HouseNo ORDER BY HouseNo) AS PartNo        
FROM myTable
) T 
GROUP BY T.LastName,T.HouseNo, (T.SrNo - T.PartNo) 

然后将它用于您的目的:

SELECT a.*,
       b.NoOfCount,
       b.RowNo
FROM   myTable         AS a
       INNER JOIN cnt  AS b
            ON  a.id BETWEEN b.start AND b.finish

以下是结果:

enter image description here

答案 4 :(得分:1)

SELECT COUNT(ID) AS _COUNT 
FROM MYTABLE 
GROUP BY ISNULL(LASTNAME, ''), ISNULL(HOUSENO, '');

答案 5 :(得分:1)

我相信,您的第三个主要目标是在相应的行上 更新 MYCOUNT列和结果。一般来说,您要找的是 相关子查询

UPDATE MYTABLE T1
   SET T1.MYCOUNT =
    ( SELECT COUNT (*)
        FROM MYTABLE T2
        WHERE T1.LASTNAME = B2.LASTNAME
        AND NVL (T2.HOUSENO, 0) = NVL (T1.HOUSENO, 0)
        GROUP BY T2.LASTNAME, T2.HOUSENO);

*注意:这是为 Oracle SQL

实现的

答案 6 :(得分:1)

我同意@Vladimir Baranov的分析,所以我在此不再重复。 我只是想使查询更简单,如下所示(在SQL Server 2012中测试)

--drop table #temp
create table  #temp  (id int, firstname varchar(15), lastname varchar(15), houseno varchar(5));
go
insert into #temp (id, firstname, lastname, houseno)
values
(1   , 'Imran'       ,'Khan'        ,'1-1')         
,(2   , 'Waseem'      ,'Khan'        ,'1-1')         
,(3   , 'Rihan'       ,'Khan'        ,'1-1')         
,(4   , 'Moiz'        ,'Shaikh'      ,'1-2')         
,(5   , 'Zbair'       ,'Shaikh'      ,'1-2')         
,(6   , 'Sultan'      ,'Shaikh'      ,'1-2')         
,(7   , 'Zaid'        ,'Khan'        , null)         
,(10  , 'Parvez'      ,'Patel'       ,'1-3')         
,(11  , 'Ahmed'       ,'Patel'       ,'1-3')         
,(12  , 'Rahat'       ,'Syed'        ,'1-4')         
,(13  , 'Talha'       ,'Khan'        ,null )         
,(14  , 'Zia'         ,'Khan'        ,null )         
,(15  , 'Arshad'      ,'Patel'       ,'1-3')         
,(16  , 'Samad'       ,'Patel'       ,'1-3')         
,(17  , 'Raees'       ,'Syed'        ,'1-4')         
,(18  , 'Azmat'       ,'Khan'        , null)      
,(19  , 'Imran'       ,'Khan'        , null)
 
-- query
; with c as (
select id, firstname, lastname, houseno=isnull(houseno, '')
, new_id=row_number() over (partition by lastname, isnull(houseno, '') order by id)
, grp = id -row_number() over (partition by lastname, isnull(houseno, '') order by id)
FROM #temp 
)
, d as (
select id, firstname, lastname, houseno, T.cnt, c.grp
, row_id=id-row_number() over ( partition by grp, houseno order by c.grp)
from c
cross apply (select cnt=count(*) from c as c2 where c.grp = c2.grp and c.lastname=c2.lastname and c.houseno=c2.houseno) T(cnt)
)
select id, FirstName, LastName, Houseno, MyCount=cnt,  CountId= DENSE_RANK() over (order by row_id)
from d

结果如下:

enter image description here

答案 7 :(得分:0)

试试这个:

SELECT COUNT(ID)作为来自MYTABLE GROUP的_COUNT来自LASTNAME + ISNULL(HOUSENO,&#39;&#39;)