将表分成组时,我遇到了一个有趣的问题。我有一群游客 - 每个人都说一种语言和/或是一个家庭的一部分。 我需要将表格分组,但我希望将家庭和类似语言的人保持在一起。
假设我想将游客分成最多3个小组(如果小组必须更大,那是可以接受的)。解决方案不一定非常聪明,它完全填满所有组,但我正在尽最大努力。
输入:
TouristID | LanguageID | FamilyID
---------------------------------
1 | 1 | 1
2 | 1 | 1
3 | 1 | 1
4 | 2 | 1
5 | 3 | 2
6 | 4 | 2
7 | 5 | 3
8 | 5 | 4
9 | 7 | 5
期望的结果:
TouristID | GroupID
-------------------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 2
6 | 2
7 | 3
8 | 3
9 | 2
第1组由所有语言1发言人组成,包括一个不能被遗漏的家庭成员。
第2组由两个家庭成员(5,6)和一个随机成员(9)组成,成为3人组。
第3组由两个相同的语言发言者(7,8)组成
我做了什么:
INSERT TouristGroup
SELECT
t.TouristID,
DENSE_RANK() OVER (ORDER BY GroupID) AS [GroupID]
FROM Tourists t
CROSS APPLY (
SELECT MIN(TouristID) AS [GroupID]
FROM Tourists t2
WHERE
( t2.LanguageID = t.LanguageID
OR t2.FamilyID = t.FamilyID )
) x;
INSERT Groups
SELECT GroupID, COUNT(*)
FROM TouristGroup
GROUP BY GroupID;
declare
@matchID int = 0,
@currentCount int,
@desiredCount int = 0,
@candidateGroupID int = null,
@chunk int = 1
while exists (
select null
from Groups g
left join Matches m
on m.GroupID = g.GroupID
where m.GroupID is null
)
begin
set @currentCount = null
set @candidateGroupID = null
select
@currentCount = isnull(SUM([Count]), 0)
from Matches m
join Groups g
on g.GroupID = m.GroupID
where m.MatchID = @matchID
if @CurrentCount is not null
begin
set @desiredCount = @chunk - @desiredCount
select top 1
@candidateGroupID = g.GroupID
from Groups g
left join Matches m
on m.GroupID = g.GroupID
where g.[Count] <= @desiredCount
and m.GroupID is null
order by [Count] DESC
if @candidateGroupID is not null
begin
insert Matches
select @matchID, @candidateGroupID
end
else begin
set @matchID = @matchID + 1
end
end
else begin
set @matchid = @matchID + 1
end
end
问题
是否有更好的方法来分区表,但是根据多列将行分组在一起?
答案 0 :(得分:1)
这将产生你的“第1步”。也许它比你现在拥有的更好(没有循环)。
SELECT t.TouristID, DENSE_RANK() OVER (ORDER BY x.GroupNum) as GroupId
FROM Tourists t
CROSS APPLY (SELECT MIN(TouristId) AS GroupNum
FROM @Tourist t2
WHERE t2.LanguageId = t.LanguageId OR t2.FamilyId = t.FamilyId
) x
至于你至少得到至少三个小组成员的其他要求,如果可能的话,你可能需要做一个类似于你正在做的循环(我不确定它是否可以改进,因为你没有分享它。)
[更新]以下是我对“第2步”的建议:
DECLARE @MinGroupSize int = 3, @rc int = 1
WHILE @rc>0
BEGIN
WITH GroupCount AS (
SELECT GroupID, COUNT(*) AS GroupCount
FROM TouristGroup
GROUP BY GroupID
), CandidateGroups AS (
SELECT TOP 1 gc1.GroupID AS ShortGroupId, singleton.GroupID as SingletonGroupID
FROM GroupCount gc1
CROSS APPLY (SELECT TOP 1 GroupID
FROM GroupCount AS gc2
WHERE gc2.GroupCount = 1 AND gc2.GroupID != gc1.GroupID
ORDER BY gc2.GroupID
) AS singleton
WHERE gc1.GroupCount < @MinGroupSize
ORDER BY GroupCount DESC, gc1.GroupID ASC
)
UPDATE tg
SET GroupID = cg.ShortGroupID
FROM TouristGroup tg
JOIN CandidateGroups cg ON cg.SingletonGroupID = tg.GroupID;
SET @rc = @@ROWCOUNT;
END
--
-- If you're anal like me and want to eliminate gaps in GroupID values
--
UPDATE tg
SET GroupID = tg2.GroupID
FROM TouristGroup tg
JOIN (SELECT TouristID, DENSE_RANK() OVER (ORDER BY GroupID) AS [GroupID]
FROM TouristGroup) AS tg2 ON tg2.TouristID = tg.TouristID
WHERE tg.GroupID != tg2.GroupID;
这将找到小于所需最小组大小的组并找到单个组(仅1个成员)并使用另一个GroupID更新单例,逐个执行此操作,直到没有其他候选者为止。按顺序选择较小的组(按GroupCount降序,然后按GroupID升序),以便首先填充较大的组。只选择单身人士进行更新,以便不会破坏任何自然群体。