我在SQL中有一个视图,它汇集了一些我需要消除重复值的数据。我尝试过使用DISTINCT和GROUP BY没有成功。基本上我们发生的是一系列上载文件,它们根据文档类型附加到Provider。当他们经历不同的签名阶段时,他们将上传文档的多个版本。
每次上传文档的新阶段时,新行都会添加到UploadedDocuments表中,RequiredDocumentsID保持不变,但UploadedFiles表中的Filename(以及该表中的ID字段)是新的。< / p>
从历史上看,这不是一个问题,因为我们通常一次只查找一个提供商的信息 - 在这种情况下,我们只抓取每个文档类型的最新信息。然而,现在我们有一个正在处理的新页面需要同时显示所有提供程序,但它只需要列出每个ONCE并且只列出最新的文件名/路径列。
以下是我目前的观点。如上所述,我尝试将第一个值作为'DISTINCT dbo.ReqDocuments.ID'以及执行GroupBy。这两个论文都未能消除任何重复。我在考虑使用嵌入式选择或OUTER,但我的tSQL技能尚未达到那个级别。
SELECT dbo.UploadedFiles.FileName, dbo.UploadedFiles.FilePath,
dbo.ReqDocuments.ProviderID, dbo.Providers.CompanyName,
dbo.ReqDocuments.ID AS RequiredDocumentID, dbo.UploadedFiles.aDate,
dbo.UploadedFiles.aUser
FROM dbo.Providers
INNER JOIN dbo.ReqDocuments ON dbo.Providers.ID = dbo.ReqDocuments.ProviderID
INNER JOIN dbo.UploadedFiles ON dbo.ReqDocuments.ID = dbo.UploadedFiles.ReqDocumentsID
WHERE (dbo.ReqDocuments.DocumentID = 50)
答案 0 :(得分:1)
您可以使用ROW_NUMBER()来解决此问题:
SELECT *
FROM (SELECT UploadedFiles.FileName, UploadedFiles.FilePath,
ReqDocuments.ProviderID, Providers.CompanyName,
dbo.ReqDocuments.ID AS RequiredDocumentID, dbo.UploadedFiles.aDate,
dbo.UploadedFiles.aUser
, ROW_NUMBER () OVER (PARTITION BY ReqDocuments.ProviderID, Providers.CompanyName, ReqDocuments.ID ORDER BY UploadedFiles.aDate DESC) as RowRank
FROM dbo.Providers
INNER JOIN dbo.ReqDocuments ON dbo.Providers.ID = dbo.ReqDocuments.ProviderID
INNER JOIN dbo.UploadedFiles ON dbo.ReqDocuments.ID = dbo.UploadedFiles.ReqDocumentsID
WHERE (dbo.ReqDocuments.DocumentID = 50)
)sub
WHERE RowRank = 1
PARTITION BY
每次上传时不会更改的字段,ORDER BY
日期下降以显示最新的字段。您可以运行内部查询以了解ROW_NUMBER()的工作原理。
另外,我喜欢别名,所以这就是:
SELECT *
FROM (SELECT upl.FILENAME
, upl.FILEPATH
, Req.ProviderID
, prv.CompanyName
, Req.ID AS RequiredDocumentID
, upl.aDate
, upl.aUser
, ROW_NUMBER () OVER (PARTITION BY Req.ProviderID, prv.CompanyName, Req.ID ORDER BY upl.aDate DESC) as RowRank
FROM Providers prv
INNER JOIN ReqDocuments Req
ON prv.ID = Req.ProviderID
INNER JOIN UploadedFiles upl
ON Req.ID = upl.ReqDocumentsID
WHERE (Req.DocumentID = 50)
)sub
WHERE RowRank = 1
答案 1 :(得分:1)
简单地说,给定一个DocumentID,你需要一个(ProviderID,FilePath)列表,其中FilePath是DocumentID和ProviderID组合的最新版本。
我会按ProviderID对所有FilePaths进行排名,并按日期排序:
SELECT outerF.FileName, outerF.FilePath,
outerD.ProviderID, outerP.CompanyName,
outerD.ID AS RequiredDocumentID, outerF.aDate,
outerF.aUser
FROM dbo.Providers outerP
INNER JOIN dbo.ReqDocuments outerD ON outerP.ID = outerD.ProviderID
INNER JOIN dbo.UploadedFiles outerF ON outerD.ID = outerF.ReqDocumentsID
WHERE (outerD.DocumentID = 50)
AND outerF.aDate = (
SELECT top 1 innerF.aDate
FROM dbo.ReqDocuments innerD
INNER JOIN dbo.UploadedFiles innerF ON innerD.ID = innerF.ReqDocumentsID
WHERE innerD.ProviderID = outerP.id
AND innerD.DocumentID = outerD.DocumentID
ORDER BY innerF.aDate DESC)
答案 2 :(得分:0)
此查询查找重复项
SELECT t1.ID FROM Table t1,Table t2 where t1.Name=t2.Name and t1.ID>t2.ID