SQL压缩表 - 删除相似项

时间:2015-08-04 14:26:43

标签: sql sql-server-2012

拥有包含ID,IDLicense,Brand和ExtraBrands

的表格

尝试通过IDLicense抓取所有类似记录,合并所有记录,方法是取IDLicense的所有副本,删除所有副本,但取名牌并将其添加到原始IDLicense,并将已删除副本的品牌添加到ExtraBrands。

到目前为止,我已经能够选择所有具有重复项的IDLicense。使用临时表存储所有额外信息。

INSERT INTO #TempTable (ID, IDLicense, Brand, ExtraBrands) 
SELECT ID, IDLicense, Brand, ExtraBrands FROM BrandOrders
WHERE IDLicense IN (SELECT IDLicense FROM BrandOrders GROUP BY IDLicense HAVING COUNT(*) > 1)

是一种简单的方法,而不是在这里使用临时表来代替删除所有类似数据并从副本中取出品牌并将其添加为ExtraBrands?然后删除重复项。

数据示例:

下表:

 1. IdLicense = 1, Brand="BlueBird", ExtraBrands is null
 2. IdLicense = 1, Brand="RedBird", ExtraBrands is null
 3. IdLicense = 1, Brand="YellowBird", ExtraBrands is null
 4. IdLicense = 2, Brand="BlueBird", ExtraBrands is null
 5. IdLicense = 2, Brand="RedBird", ExtraBrands is null

最后它应该被压缩到

 1. IdLicense = 1, Brand="BlueBird", ExtraBrands = "RedBird YellowBird"
 2. IdLicense = 2, Brand="BlueBird", ExtraBrands = "RedBird"

1 个答案:

答案 0 :(得分:1)

您可以使用下面的代码执行您想要的操作,但我建议不要使用这种数据库的非规范化。在单个列中存储多个离散值会破坏关系模型,并且经常会导致各种问题。

相反,我建议你规范化你的表并使用下面的模式,你有一个连接许可证实体和品牌实体的联结表:

CREATE TABLE BrandOrders (IdLicense int primary key);
CREATE TABLE Brands (BrandID int primary key, Brand varchar(20));
CREATE TABLE LicenseBrands (
    IdLicense int foreign key references BrandOrders, 
    BrandID int foreign key references Brands, 
    MainBrand bit,
    PRIMARY KEY (IdLicense, BrandId)
);

这既可以确保数据完整性,又可以节省空间,而且使用起来也更容易。

话虽如此,这里是用于“修复”您的数据的查询(更新,然后删除):

;with cte as (
    select *, r=row_number() over (partition by idlicense order by id) 
    from brandorders
    where idlicense in (
       select idlicense from brandorders group by idlicense having count(*) > 1
    )
)

update extern
set extrabrands = left(c , len(c)-1) 
from cte extern
cross apply
(
    select brand + ','
    from cte as intern
    where extern.idlicense = intern.idlicense and r > 1
    for xml path('')
) extrabrands (c)
where extern.r = 1;

delete from brandorders 
where idlicense in (
    select idlicense from brandorders group by idlicense having count(*) > 1
    ) 
  and extrabrands is null;

执行后的结果是您的数据如下所示:

ID  IdLicense   Brand       ExtraBrands
1   1           BlueBird    RedBird,YellowBird
4   2           BlueBird    RedBird

Sample SQL Fiddle