我有一个名为1 Main Contacts(“Main”)的表,以及其他几个表。它们都有名为Contact_ID(或referral_ID)的列,它们将它们挂钩。所有设置方式的一个问题是“Main”中的记录可以链接到引用表中的多个记录,因此在运行查询以通过“Contact_Source”列提取联系人后,我得到重复记录推荐表。
我创建了一个视图,我可以从网站运行查询时从中选择,这样我就可以从与正确情况相关的数据中提取。我也通过他们的“联系人来源”运行这个查询。我之前发布了2个问题(here和here),以避免重复记录。我有那个工作。
这是我让我无法获得重复记录的最终代码:
ALTER VIEW dbo.v_angelview AS
WITH q AS
(
SELECT dbo.[1_MAIN - Contacts].Contact_ID, dbo.[1_MAIN - Contacts].Date_entered_into_Database, dbo.[1_MAIN - Contacts].Date_of_Initial_Contact,
dbo.[1_MAIN - Contacts].[Company_ Name], dbo.[1_MAIN - Contacts].Key_Contact_Title, dbo.[1_MAIN - Contacts].Key_Contact_First_Name,
dbo.[1_MAIN - Contacts].Key_Contact_Middle, dbo.[1_MAIN - Contacts].Key_Contact_Last_Name, dbo.[1_MAIN - Contacts].Key_Credential,
dbo.[1_MAIN - Contacts].Key_Contact_Occupation, dbo.[1_MAIN - Contacts].Key_Degree_1, dbo.[1_MAIN - Contacts].Key_Degree_2,
dbo.[1_MAIN - Contacts].Key_Degree_3, dbo.[1_MAIN - Contacts].Date_of_Highest_Degree, dbo.[1_MAIN - Contacts].Work_Setting,
dbo.[1_MAIN - Contacts].Website_Address, dbo.[1_MAIN - Contacts].Email_1_Key_Contact, dbo.[1_MAIN - Contacts].Email_2,
dbo.[1_MAIN - Contacts].Email_3, dbo.[1_MAIN - Contacts].Day_Time_Phone_Number, dbo.[1_MAIN - Contacts].Extension,
dbo.[1_MAIN - Contacts].Mobile_Phone_Number, dbo.[1_MAIN - Contacts].Bus_Fax_Number, dbo.[1_MAIN - Contacts].Home_Phone_Number,
dbo.[1_MAIN - Contacts].Home_Fax_Number, dbo.[1_MAIN - Contacts].Mailing_Street_1, dbo.[1_MAIN - Contacts].Mailing_Street_2,
dbo.[1_MAIN - Contacts].Mailing_City, dbo.[1_MAIN - Contacts].Mailing_State, dbo.[1_MAIN - Contacts].[Mailing_Zip/Postal],
dbo.[1_MAIN - Contacts].Mailing_Country, dbo.[1_MAIN - Contacts].[Bad_Address?], dbo.[1_MAIN - Contacts].[PROV/REG?],
dbo.[1_MAIN - Contacts].status_flag, dbo.[1_MAIN - Contacts].status_flag AS status_flag2, dbo.Providers.Referral_Source, dbo.Referral.Contact_Source,
dbo.Resource_Center.cert_start_date, dbo.Resource_Center.cert_exp_date, dbo.prov_training_records.Contact_ID AS Expr2,
dbo.prov_training_records.date_reg_email_sent, dbo.Resource_Center.access, dbo.Providers.Contact_ID AS Expr1,
ROW_NUMBER() OVER (PARTITION BY dbo.[1_MAIN - Contacts].Contact_ID ORDER BY dbo.[1_MAIN - Contacts].Contact_ID) AS rn
FROM dbo.[1_MAIN - Contacts]
INNER JOIN
dbo.Referral
ON dbo.[1_MAIN - Contacts].Contact_ID = dbo.Referral.Referral_ID
INNER JOIN
dbo.prov_training_records
ON dbo.[1_MAIN - Contacts].Contact_ID = dbo.prov_training_records.Contact_ID
LEFT OUTER JOIN
dbo.Resource_Center
ON dbo.[1_MAIN - Contacts].Contact_ID = dbo.Resource_Center.Contact_ID
FULL OUTER JOIN
dbo.Providers
ON dbo.[1_MAIN - Contacts].Contact_ID = dbo.Providers.Contact_ID
WHERE (dbo.[1_MAIN - Contacts].Mailing_State = N'AL') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'FL') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'GA') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'KY') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'MS') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'NC') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'SC') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'TN') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'PR') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'CO') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'MT') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'ND') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'SD') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'UT') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'WY') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'AR') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'LA') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'NM') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'OK') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'TX') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'AZ') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'CA') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'HI') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'ID') OR
(dbo.[1_MAIN - Contacts].Mailing_State = N'NV')
)
SELECT *
FROM q
WHERE rn = 1
现在我遇到的问题是我需要设置一个优先级,保留我保留的重复记录以及删除哪些记录。
这是从最先例开始的优先级层次结构:
记录可以链接到其中一个或全部。因此,例如,如果记录有PROVIDER和RG_Train,我想用PROVIDER保留记录。等等列表。同样,所有记录都有一个Contact_ID,这就是我可以告诉它有重复的信息。
有没有办法修改现有的SQL来执行此操作,还是需要新方法?如果是这样,我如何根据我的优先级列表删除重复记录?
我正在使用SQL Server 2005。
提前致谢!
答案 0 :(得分:1)
...试
rn = ROW_NUMBER()
OVER
(
PARTITION BY dbo.[1_MAIN - Contacts].Contact_ID
ORDER BY
CASE
WHEN Contact_Source = 'PROVIDER'
THEN 1
WHEN Contact_Source LIKE 'RG_%'
THEN 2
WHEN Contact_Source LIKE 'IN_%'
THEN 3
WHEN Contact_Source LIKE 'LD_%'
THEN 4
ELSE 5
END
)
答案 1 :(得分:0)
这并没有利用你对ROW_NUMBER函数的使用,但我相信这会有效:
with q as (
-- <query>
),
cp as (
select 1 as Precedence, 'PROVIDER' ContactSourcePattern
union all select 2, 'RG_%'
union all select 3, 'IN_%'
union all select 4, 'LD_%'
)
select q.*
from q
inner join cp on q.Contact_Source like cp.ContactSourcePattern
where
-- filter out duplicate records with the same `Contact_ID` that have a lower precedence than other records
not exists (
select 1
from q as q2 inner join cp as cp2 on q2.Contact_Source like cp2.ContactSourcePattern
where
q2.Contact_ID = q.Contact_ID -- q2 is a duplicate of q if `Contact_ID` matches
and cp2.Precedence < cp.Precedence -- q2/cp2 is higher precedence than q/cp if `Precedence` is a smaller number
)