识别类似元组的表中的数据不一致

时间:2011-07-05 11:15:56

标签: winforms sql-server-2008 c#-4.0

我有一张表格,其中包含工人的可用性状态。结构如下:

CREATE TABLE [dbo].[Availability]
(
    [OID]                  BIGINT        IDENTITY (1, 1) NOT NULL,
    [LocumID]              BIGINT        NOT NULL,
    [AvailableDate]        SMALLDATETIME NOT NULL,
    [AvailabilityStatusID] INT           NOT NULL,
    [LastModifiedAt]       TIMESTAMP     NOT NULL,
    CONSTRAINT [PK_Availability] PRIMARY KEY CLUSTERED ([OID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];

结果如下:

OID                  LocumID              AvailableDate           AvailabilityStatusID LastModifiedAt
-------------------- -------------------- ----------------------- -------------------- ------------------
1                    1                    2009-03-02 00:00:00     1                    0x0000000000201A8C
2                    2                    2009-03-04 00:00:00     1                    0x0000000000201A8D
3                    1                    2009-03-05 00:00:00     1                    0x0000000000201A8E
4                    1                    2009-03-06 00:00:00     1                    0x0000000000201A8F
5                    2                    2009-03-07 00:00:00     1                    0x0000000000201A90
6                    7                    2009-03-09 00:00:00     1                    0x0000000000201A91
7                    1                    2009-03-11 00:00:00     1                    0x0000000000201A92
8                    1                    2009-03-12 00:00:00     2                    0x0000000000201A93
9                    1                    2009-03-14 00:00:00     1                    0x0000000000201A94
10                   1                    2009-03-16 00:00:00     1                    0x0000000000201A95

现在,该表有超过3mil的记录,我注意到我的数据存在不一致。我需要以某种方式找到任何[AvailableDate]的行,[LocumID](不管有多少)必须是唯一的。所以,基本上,一个工人可以在一个日期拥有其中一个[AvailabilityStatusID] = 1, 2, 3, or 4。但是,在此表中,如果工作人员对[AvailableDate]具有相同[AvailabilityStatusID]或不同[AvailabilityStatusID]

的{{1}}输入两次或更多次,则会出现错误

如何检测这些记录?

问候。

1 个答案:

答案 0 :(得分:2)

WITH x AS 
(
  SELECT LocumID, dt = AvailableDate
     FROM dbo.Availability
     GROUP BY LocumID, AvailableDate 
     HAVING COUNT(*) > 1
)
SELECT a.OID, a.LocumID, a.AvailableDate, 
    a.AvailabilityStatusID, a.LastModifiedAt
 FROM x
 INNER JOIN dbo.Availability AS a
 ON x.LocumID = a.LocumID
 AND x.dt = a.AvailableDate
 ORDER BY a.LocumID, a.AvailableDate;

一旦你清理了这些数据(不确定你的规则将保留哪些行),你应该考虑(LocumID,AvailableDate)的唯一约束。以下是创建约束的方法(尽管在删除重复项之后将无法创建约束):

ALTER TABLE dbo.Availability 
    ADD CONSTRAINT uq_l_ad 
    UNIQUE (LocumID, AvailableDate);

当然,现在您将有新的错误返回到您的应用程序(Msg 2627),因为您的当前代码显然尚未检查是否已存在LocumID / AvailabilityDate组合,然后再添加新的错误。