我有一个表,其中包含人员信息和信息来源的文件名,所以表格如下:
|Table|
|id, first-name, last-name, ssn, filename|
我还有一个存储过程,它为系统中的文件提供了一些分析,我正在尝试向该存储过程添加信息,以便了解重复的可能性。
这是当前的存储过程
SELECT [filename],
COUNT([filename]) as totalRecords,
COUNT(closedleads.id) as closedRecords,
ROUND(--calcs percent of records closed in a file)
FROM table
LEFT OUTER JOIN closedleads ON closedleads.leadid = table.id
GROUP BY [filename]
我想要添加的是能够查看可能的重复项,定义为具有匹配SSN的记录,我不知道如何对子查询或联接执行计数并将其包含在结果集。任何人都可以提供一些指示吗?
我要做的是在上面的程序中添加这样的内容
SELECT COUNT(
SELECT COUNT(*) FROM Table T1
INNER JOIN Table T2 on T1.SSN = T2.SSN
WHERE T1.id != T2.id
) as PossibleDuplicates
我正在寻找的是将此代码与上面的程序合并,这样我就可以将所有相同的数据合二为一,并且可能在每个文件名中都有这样的重复数据,因此对于每个文件名,我得到#of的结果记录,已关闭的记录数和可能重复的数量
编辑:
我非常接近我想要的目标但是我在最后一点上失败了 - 得到可能重复的文件数量,这是我的查询
select [q1].[filename], [q1].leads, [q1].closed, [q2].dups
FROM (
SELECT [filename], count([filename]) as leads,
count(closedleads.id) as closed
FROM Table
left join closedleads on closedleads.leadid = Table.id
group by [filename]
) as [q1]
INNER JOIN (
select count([ssn]) as dups, [filename] from Table
group by [ssn], [filename]
having count([ssn]) > 1
) as [q2] on [q1].[filename] = [q2].[filename]
这样可行,但它显示了每个文件名的多个结果,值为2-5,而不是将可能的重复项总数相加
大家好,感谢所有的帮助,这最终达到的目的完全符合我的要求
select [q1].[filename], [q1].leads, [q1].closed, [q2].dups,
round(([q1].closed / [q1].leads), 3) as percentClosed
FROM (
SELECT [filename], count([filename]) as leads,
count(closedleads.id) as closed
FROM Table
left join closedleads on closedleads.leadid = Table.id
and [filename] is not null
group by [filename]
) as [q1]
INNER JOIN (
select [filename], count(*) - count(distinct [ssn]) as dups
from Table
group by [filename]
) as [q2] on [q1].[filename] = [q2].[filename]
答案 0 :(得分:3)
你可能想在某个地方使用HAVING子句,例如:
LEFT JOIN (
SELECT SSN, COUNT(SSN) - 1 DupeCount FROM Table T1
GROUP BY SSN
HAVING COUNT(SSN) > 1 ) AS PossibleDuplicates
ON table.ssn = PossibleDuplicates.SSN
如果要包含0个可能的重复项(而不是null),实际上不需要HAVING
子句,只需要左连接。
答案 1 :(得分:1)
修改 - 更新了更好的示例,更好地匹配您的问题
如果我理解正确,这是一个例子。
create table #table (id int,ssn varchar(10))
insert into #table values(1,'10')
insert into #table values(2,'10')
insert into #table values(3,'11')
insert into #table values(4,'12')
insert into #table values(5,'11')
insert into #table values(6,'13')
select sum(cnt)
from (
select count(distinct ssn) as cnt
from #table
group by ssn
having count(*)>1
) dups
如果你按ssn分组,那么你不应该自己加入这个表,然后只回到ssn,你可以在那里拉回一个。
答案 2 :(得分:0)
您不需要外部COUNT
- 您的内部SELECT COUNT(*)...
只会返回一个数字,即重复SSN
但不同id
的记录数。< / p>
答案 3 :(得分:0)
我认为现有的答案并不完全理解你的问题。我想我做了,但还没有完全明确。如果相同的SSN出现在两个不同的文件中或仅在同一个文件中,它是否重复?因为你按文件名分组,这就变成了谷物。
您的查询输出就像
StateFarm1, 500, 50, 10%, <your new value goes here>
AllState2, 100, 90, 90% <your new value goes here>
因此,如果这两个文件中的SSN相同,则 1 重复,那么在AllState行或Statefarm行上显示1的行是什么?如果你同时说两者,那么总会有人对该专栏进行总结并将结果加倍。
现在,如果你有一个具有相同SSN的Geico行,那1个重复或2个怎么办?又一行?
我知道这不是最终答案,但这些问题确实突出了问题,因为它无法回答......你解决了这个问题,我会改变答案,
请不要同时支持
我相信你唯一缺少的就是DISTINCT。
select [q1].[filename], [q1].leads, [q1].closed, [q2].dups
FROM (
SELECT [filename], count([filename]) as leads,
count(closedleads.id) as closed
FROM tbldata
left join closedleads on closedleads.leadid = Table.id
group by [filename]
) as [q1]
INNER JOIN (
select count( DISTINCT [ssn]) as dups, [filename] from Table '<---- here'
group by [ssn], [filename]
having count([ssn]) > 1
) as [q2] on [q1].[filename] = [q2].[filename]