我希望优化跨多行的连接,并阅读一些类似的问题熟悉使用STUFF + XML路径等。但是,当我将这些应用于我的查询时,它通常会在申请900万行时超时或者我有行
我正在寻找的是一种更有效的翻译方式:
create table #fruit
(
Contact_id NVARCHAR(50)
,fruit_type NVARCHAR(50)
,[2005_orders] int
,[2006_orders] int
,[2007_orders] int
,[2008_orders] int
,[2009_orders] int
)
INSERT INTO #fruit VALUES ('id001','banana',1,3,0,25,4)
INSERT INTO #fruit VALUES ('id001','apple',0,7,19,1,0)
INSERT INTO #fruit VALUES ('id001','orange',0,0,0,9,0)
INSERT INTO #fruit VALUES ('id001','strawberry',1,1,1,1,4)
INSERT INTO #fruit VALUES ('id001','grapes',0,3,0,0,0)
INSERT INTO #fruit VALUES ('id001','lemon',1,1,1,0,0)
进入这个:
CREATE TABLE #results
(
contact_id NVARCHAR(255)
,fruit_type NVARCHAR(50)
,[2005_orders] int
,[2006_orders] int
,[2007_orders] int
,[2008_orders] int
,[2009_orders] int
,combination2005 NVARCHAR(500)
,combination2006 NVARCHAR(500)
,combination2007 NVARCHAR(500)
,combination2008 NVARCHAR(500)
,combination2009 NVARCHAR(500)
)
INSERT INTO #results VALUES ('id001','banana',1,3,0,25,4,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','apple',0,7,19,1,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','orange',0,0,0,9,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','strawberry',1,1,1,1,4,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','grapes',0,3,0,0,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','lemon',1,1,1,0,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
要考虑的关键因素是每个联系人每个水果类型需要一行(因为此表将在其他地方使用),并且如果计数大于,我只想要一个水果使其成为组合类型0
考虑到我正在处理的行数,这可能永远不会非常有效,但是如果有机会我可以将这些信息附加到我的桌子上那将是很棒的:)
已尝试的方法
方法1)
SELECT *
,STUFF(
(SELECT ' ' + fruit_type
FROM #fruit fr2
WHERE fr.contact_id = fr2.contact_id
AND 2005_orders > 0
order by contact_id,fruit_type
FOR XML path ('')
)
,1,1,''
) AS combination
FROM #fruit fr
方法2)
SELECT *
,ISNULL((MAX(CASE WHEN fruit_type = 'banana' AND 2005_orders > 0 THEN 'banana ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'apple' AND 2005_orders > 0 THEN 'apple ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'orange' AND 2005_orders > 0 THEN 'orange' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'strawberry' AND 2005_orders > 0 THEN 'strawberry ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'grapes' AND 2005_orders > 0 THEN 'grapes ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'lemon' AND 2005_orders > 0 THEN 'lemon ' END) OVER (PARTITION BY contact_id)),'')+
AS combination05
FROM #fruit fr
- 然后重复2006 - 2009年(我知道这是非常低效的!)
答案 0 :(得分:1)
两种方法的性能问题都将是子查询。尝试这种策略将其分开并避免子查询。
如果您保证每个contact_id / fruit_type组合都有记录,则无需使用外连接。
contact_id索引应该可以大大提高性能。
SELECT
f.*
, combination2005 =
CASE WHEN b.[2005_orders] = 0 OR b.[2005_orders] IS NULL THEN '' ELSE 'banana + ' END
+ CASE WHEN a.[2005_orders] = 0 OR a.[2005_orders] IS NULL THEN '' ELSE 'apple + ' END
+ CASE WHEN o.[2005_orders] = 0 OR o.[2005_orders] IS NULL THEN '' ELSE 'orange + ' END
+ CASE WHEN s.[2005_orders] = 0 OR s.[2005_orders] IS NULL THEN '' ELSE 'strawberry + ' END
, combination2006 =
CASE WHEN b.[2006_orders] = 0 OR b.[2006_orders] IS NULL THEN '' ELSE 'banana + ' END
+ CASE WHEN a.[2006_orders] = 0 OR a.[2006_orders] IS NULL THEN '' ELSE 'apple + ' END
+ CASE WHEN o.[2006_orders] = 0 OR o.[2006_orders] IS NULL THEN '' ELSE 'orange + ' END
+ CASE WHEN s.[2006_orders] = 0 OR s.[2006_orders] IS NULL THEN '' ELSE 'strawberry + ' END
FROM
#fruit f
LEFT OUTER JOIN
( SELECT * FROM #fruit WHERE fruit_type = 'banana' ) b
ON
f.contact_id = b.contact_id
LEFT OUTER JOIN
( SELECT * FROM #fruit WHERE fruit_type = 'apple' ) a
ON
f.contact_id = a.contact_id
LEFT OUTER JOIN
( SELECT * FROM #fruit WHERE fruit_type = 'orange' ) o
ON
f.contact_id = o.contact_id
LEFT OUTER JOIN
( SELECT * FROM #fruit WHERE fruit_type = 'strawberry' ) s
ON
f.contact_id = s.contact_id
并且不要使用“SELECT *”,我只是在偷懒。
我应该补充一点,如果你不希望每个contact_id都有一个记录每个fruit_type(因此你需要在这里使用外连接),那么case表达式除了零之外还应测试null。 (上面补充说)
答案 1 :(得分:0)
在您的环境中可能不合适/可能,但考虑到这些数字与过去相关,您是否考虑过将每个联系人每年的连接字符串预先生成一个表格,使用您目前拥有的任何方法,并加入那对你的结果?
您可能必须使用预定作业来维护当前年份的字符串,但2005-9的数据似乎不太可能发生变化。
答案 2 :(得分:0)
我认为超时可以追溯到您正在计算的冗余数据量。当我查看您的样本时,似乎会为表中的每一行计算水果列表。我采取了一种可能有所帮助的不同方法。
我使用下面的选择将水果名称放在销售价值的位置。
SELECT
contact_id
,substring([fruit_type],1,cast(CAST([2005_orders] as bit)as int)*50) AS [2005_fruit]
,substring([fruit_type],1,cast(CAST([2006_orders] as bit)as int)*50) AS [2006_fruit]
,substring([fruit_type],1,cast(CAST([2007_orders] as bit)as int)*50) AS [2007_fruit]
,substring([fruit_type],1,cast(CAST([2008_orders] as bit)as int)*50) AS [2008_fruit]
,substring([fruit_type],1,cast(CAST([2009_orders] as bit)as int)*50) AS [2009_fruit]
from #fruit
contact_id 2005_fruit 2006_fruit 2007_fruit 2008_fruit 2009_fruit
-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
id001 banana banana banana banana
id001 apple apple apple
id001 orange
id001 strawberry strawberry strawberry strawberry strawberry
id001 grapes
id001 lemon lemon lemon
我相信下一步是将所显示的选择的输出转换为连接水果。目标应该是一个可以连接到原始水果桌的桌子。如果需要,可以将表硬化到结果表中,除非有很多冗余数据可以杀死IO。