我能够获得具有最多相同值的列数,例如
SELECT COUNT(*) AS Count, ProjectID
FROM Projects
GROUP BY ProjectID
ORDER BY Count DESC
所以现在我有这样的表,
ProjectID ProjectUrl
1 http://www.CompanyA.com/Projects/123
2 http://www.CompanyB.com/Projects/124
3 http://www.CompanyA.com/Projects/125
4 http://www.CompanyB.com/Projects/126
5 http://www.CompanyA.com/Projects/127
ProjectUrl = http://www.CompanyA.com Count = 3
ProjectUrl = http://www.CompanyB.com Count = 2
修改
抱歉,我忘记提及表格中的网址类型,但网址随意安静,但有些网址很常见。由于我们正在创建项目类别,因此项目类别URL可以是
https://spanish.CompanyAa2342.com/portal/projectA/projectTeamA/ProjectPersonA/Task/124
但是对于某些项目没有项目团队等等,所以它有点随机:?
我需要查询更像通用的内容。
Url的共同点
http://ramdomLanguage.CompanyName.com/portal/RandomName .....
答案 0 :(得分:2)
请尝试:
select
Col,
COUNT(Col) Cnt
from(
select
SUBSTRING(ProjectUrl, 0, PATINDEX('%.com/%', ProjectUrl)+4) Col
from tbl
)x group by Col
答案 1 :(得分:0)
在处理庞大的数据集时不确定性能,但这是一个解决方案。我试图为每个网址部分排一行,用/分隔。然后在最后进行快速聚合以显示每个单独部分的计数。小提琴在这里:http://www.sqlfiddle.com/#!3/742c4/12(为了演示而我添加了一行 - 感谢TechT。)
WITH cteFSPositions
AS
(
SELECT ProjectID,
ProjectURL,
1 AS CharPos,
MAX(LEN(ProjectURL)) AS MaxLen,
CHARINDEX('/', ProjectURL) AS FSPos
FROM Projects
GROUP BY ProjectID,
ProjectURL
UNION ALL
SELECT ProjectID,
ProjectURL,
CharPos + 1,
MaxLen,
CHARINDEX('/', ProjectURL, CharPos + 1) AS FSPos
FROM cteFSPositions
WHERE CharPos <= MaxLen
),
cteProjectURLParts
AS
(
SELECT DISTINCT ProjectID,
LEFT(ProjectURL, FSPos) AS ProjectURLPart,
FSPos
FROM cteFSPositions
WHERE FSPos > 0
UNION ALL
SELECT ProjectID,
ProjectURL,
LEN(ProjectURL)
FROM Projects
),
cteFilteredProjectURLParts
AS
(
SELECT ProjectID,
ProjectURLPart
FROM cteProjectURLParts
WHERE ProjectURLPart NOT IN ('http:', 'http:/', 'http://', 'https:', 'https:/', 'https://')
)
SELECT ProjectURLPart,
COUNT(*) AS Instances
FROM cteFilteredProjectURLParts
GROUP BY ProjectURLPart
ORDER BY Instances DESC,
ProjectURLPart;
这会产生(我添加了额外的行):
ProjectURLPart Instances
http://www.CompanyA.com/ 4
http://www.CompanyA.com/Projects/ 3
http://www.CompanyB.com/ 2
http://www.CompanyB.com/Projects/ 2
http://www.CompanyA.com/BlahblahBlah/ 1
http://www.CompanyA.com/BlahblahBlah/More1/ 1
http://www.CompanyA.com/BlahblahBlah/More1/More2 1
http://www.CompanyA.com/Projects/123 1
http://www.CompanyA.com/Projects/125 1
http://www.CompanyA.com/Projects/127 1
http://www.CompanyB.com/Projects/124 1
http://www.CompanyB.com/Projects/126 1
编辑:哎呀,原帖有正在进行的小提琴代码。提供了最终的代码和更新的小提琴链接。
编辑2:由于我正在削减网址的方式,我意识到我正在切断网址的末尾部分。为了完整性&#39;为此,我已将它们添加回最终数据集中。更新了小提琴。