我需要在一个大表中组合大约15个表的列。以下内容有效。但是,当CPU使用率达到100%时,运行需要很长时间,这会引起担忧。任何建议都将受到高度赞赏。
declare @t1 table (empid int)
declare @t2 table (empid int, phone varchar(50))
declare @t3 table (empid int, license varchar(50))
declare @t4 table (empid int, email varchar(100))
insert into @t1 values (1)
insert into @t1 values (2)
insert into @t1 values (3)
insert into @t2 values (1, '5551234')
insert into @t2 values (2, '5553333')
insert into @t2 values (2, 'ttt2222')
insert into @t3 values (2, 'L4455')
insert into @t3 values (3, 'L7890')
insert into @t4 values (2, 'xxx@abc')
SELECT t1.empid, t2.phone, t3.license, t4.email
FROM
@t1 t1
LEFT OUTER JOIN
(SELECT empid, phone, row_number() over (partition by empid order by phone) as rn
FROM @t2) t2
ON t2.empid = t1.empid
FULL OUTER JOIN
(SELECT empid, license, row_number() over (partition by empid order by license) as rn
FROM @t3) t3
ON t3.empid=t1.empid and (t2.rn is null or t3.rn = t2.rn)
FULL OUTER JOIN
(SELECT empid, email, row_number() over (partition by empid order by email) as rn
FROM @t4) t4
ON t4.empid=t1.empid and t4.rn=coalesce(t2.rn, t3.rn) --image how long this coalesce clause is going to be for the 15th table?
order by t1.empid, t2.rn
答案 0 :(得分:2)
您的问题不是很清楚,如果您包含预期结果会有所帮助。让我猜猜你想要什么...
我将为您的示例中的表提供更有意义的名称,并添加更多行以突出显示该问题。在现实生活中,这些表将是真正的表,当然,不是变量,但我会坚持使用变量来使这个示例脚本易于运行和尝试。我在这个例子中使用SQL Server 2008。
declare @TMain table (empid int);
declare @TPhones table (empid int, phone varchar(50));
declare @TLicenses table (empid int, license varchar(50));
declare @TEmails table (empid int, email varchar(100));
insert into @TMain values (1);
insert into @TMain values (2);
insert into @TMain values (3);
insert into @TMain values (4);
insert into @TPhones values (1, '5551234');
insert into @TPhones values (2, '5551111');
insert into @TPhones values (2, '5552222');
insert into @TPhones values (2, '5553333');
insert into @TPhones values (2, '5554444');
insert into @TLicenses values (2, 'L4455');
insert into @TLicenses values (3, 'L7890');
insert into @TEmails values (2, 'xxx@abc');
insert into @TEmails values (2, 'yyy@abc');
insert into @TEmails values (2, 'zzz@abc');
有一种快速,有效和错误的天真方法:
SELECT
Main.empid
,Phones.phone
,Licenses.license
,Emails.email
FROM
@TMain AS Main
LEFT JOIN @TPhones AS Phones ON Phones.empid = Main.empid
LEFT JOIN @TLicenses AS Licenses ON Licenses.empid = Main.empid
LEFT JOIN @TEmails AS Emails ON Emails.empid = Main.empid
ORDER BY Main.empid, phone, license, email;
它生成所有行的笛卡尔积并复制行。这是上面查询的结果集。您可以看到empid = 2
返回了12行,即4个电话乘以3封电子邮件和1个许可证。我的猜测是你只想看到empid = 2
的4行。换句话说,对于每个empid,结果应该具有最小可能的行数(我将在最后显示正确的结果集)。
empid phone license email
1 5551234 NULL NULL
2 5551111 L4455 xxx@abc
2 5551111 L4455 yyy@abc
2 5551111 L4455 zzz@abc
2 5552222 L4455 xxx@abc
2 5552222 L4455 yyy@abc
2 5552222 L4455 zzz@abc
2 5553333 L4455 xxx@abc
2 5553333 L4455 yyy@abc
2 5553333 L4455 zzz@abc
2 5554444 L4455 xxx@abc
2 5554444 L4455 yyy@abc
2 5554444 L4455 zzz@abc
3 NULL L7890 NULL
4 NULL NULL NULL
我不确定下面提出的方法是否比你的更有效。您必须同时尝试并比较数据的性能。
我们需要一张数字表。 SQL, Auxiliary table of numbers http://web.archive.org/web/20150411042510/http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html http://dataeducation.com/you-require-a-numbers-table/
同样,在现实生活中,你会有一个合适的数字表,但是对于这个例子,我将使用以下内容:
declare @TNumbers table (Number int);
insert into @TNumbers values (1);
insert into @TNumbers values (2);
insert into @TNumbers values (3);
insert into @TNumbers values (4);
insert into @TNumbers values (5);
我的方法背后的主要思想是首先制作一个包含每个empid的正确行数的辅助表,然后使用该表有效地获得结果。
我们将从计算每个empid的电话号码,许可证和电子邮件开始:
WITH
CTE_Rows
AS
(
SELECT Phones.empid, COUNT(*) AS EmpRows
FROM @TPhones AS Phones
GROUP BY Phones.empid
UNION ALL
SELECT Licenses.empid, COUNT(*) AS EmpRows
FROM @TLicenses AS Licenses
GROUP BY Licenses.empid
UNION ALL
SELECT Emails.empid, COUNT(*) AS EmpRows
FROM @TEmails AS Emails
GROUP BY Emails.empid
)
然后我们计算每个empid的最大行数:
,CTE_MaxRows
AS
(
SELECT
CTE_Rows.empid
,MAX(CTE_Rows.EmpRows) AS MaxEmpRows
FROM CTE_Rows
GROUP BY CTE_Rows.empid
)
上面的CTE对每个empid都有一行:empid本身以及这个empid的最大数量的电话,许可证和电子邮件。现在我们需要扩展此表并为每个empid生成给定的行数。我在这里使用Numbers表:
,CTE_RowNumbers
AS
(
SELECT
CTE_MaxRows.empid
,Numbers.Number AS rn
FROM
CTE_MaxRows
CROSS JOIN @TNumbers AS Numbers
WHERE
Numbers.Number <= CTE_MaxRows.MaxEmpRows
)
然后我们需要向所有带有数据的表中添加行号,我们稍后会用它们加入:
,CTE_Phones
AS
(
SELECT
Phones.empid
,ROW_NUMBER() OVER (PARTITION BY Phones.empid ORDER BY phone) AS rn
,Phones.phone
FROM @TPhones AS Phones
)
,CTE_Licenses
AS
(
SELECT
Licenses.empid
,ROW_NUMBER() OVER (PARTITION BY Licenses.empid ORDER BY license) AS rn
,Licenses.license
FROM @TLicenses AS Licenses
)
,CTE_Emails
AS
(
SELECT
Emails.empid
,ROW_NUMBER() OVER (PARTITION BY Emails.empid ORDER BY email) AS rn
,Emails.email
FROM @TEmails AS Emails
)
现在我们已准备好加入所有这些。 CTE_RowNumbers
具有我们需要的确切行数,因此此处不需要复杂的FULL JOINs
,简单LEFT JOIN
就足够了:
,CTE_Data
AS
(
SELECT
CTE_RowNumbers.empid
,CTE_Phones.phone
,CTE_Licenses.license
,CTE_Emails.email
FROM
CTE_RowNumbers
LEFT JOIN CTE_Phones ON CTE_Phones.empid = CTE_RowNumbers.empid AND CTE_Phones.rn = CTE_RowNumbers.rn
LEFT JOIN CTE_Licenses ON CTE_Licenses.empid = CTE_RowNumbers.empid AND CTE_Licenses.rn = CTE_RowNumbers.rn
LEFT JOIN CTE_Emails ON CTE_Emails.empid = CTE_RowNumbers.empid AND CTE_Emails.rn = CTE_RowNumbers.rn
)
我们差不多完成了。我猜,主表有可能没有任何相关数据(没有电话,没有虱子,没有电子邮件),比如样本数据中的empid = 4
。为了在结果集中获得这些empid,我将把CTE_Data连接到主表:
SELECT
Main.empid
,CTE_Data.phone
,CTE_Data.license
,CTE_Data.email
FROM
@TMain AS Main
LEFT JOIN CTE_Data ON CTE_Data.empid = Main.empid
ORDER BY Main.empid, phone, license, email;
要获取完整脚本,只需将此帖子中的所有代码块按照此处显示的顺序放在一起。
这是结果集:
empid phone license email
1 5551234 NULL NULL
2 5551111 L4455 xxx@abc
2 5552222 NULL yyy@abc
2 5553333 NULL zzz@abc
2 5554444 NULL NULL
3 NULL L7890 NULL
4 NULL NULL NULL
答案 1 :(得分:-1)
您是否有理由在
上使用row_number()看起来你想要这样的东西......
SELECT t1.empid, t2.phone, t3.license, t4.email
FROM @t1 t1
LEFT JOIN @t2 t2 ON t1.EmpId = t2.EmpId
LEFT JOIN @t3 t3 ON t1.EmpId = t3.EmpId
LEFT JOIN @t4 t4 ON t1.EmpId = t4.EmpId