跨行的最佳连接

时间:2010-12-06 12:47:42

标签: sql sql-server sql-server-2005 tsql

我希望优化跨多行的连接,并阅读一些类似的问题熟悉使用STUFF + XML路径等。但是,当我将这些应用于我的查询时,它通常会在申请900万行时超时或者我有行

我正在寻找的是一种更有效的翻译方式:

create table #fruit
(
Contact_id NVARCHAR(50)
,fruit_type NVARCHAR(50)
,[2005_orders] int
,[2006_orders] int
,[2007_orders] int
,[2008_orders] int
,[2009_orders] int
)
INSERT INTO #fruit VALUES ('id001','banana',1,3,0,25,4)
INSERT INTO #fruit VALUES ('id001','apple',0,7,19,1,0)
INSERT INTO #fruit VALUES ('id001','orange',0,0,0,9,0)
INSERT INTO #fruit VALUES ('id001','strawberry',1,1,1,1,4)
INSERT INTO #fruit VALUES ('id001','grapes',0,3,0,0,0)
INSERT INTO #fruit VALUES ('id001','lemon',1,1,1,0,0)

进入这个:

CREATE TABLE #results
(
contact_id NVARCHAR(255)
,fruit_type NVARCHAR(50)
,[2005_orders] int
,[2006_orders] int
,[2007_orders] int
,[2008_orders] int
,[2009_orders] int
,combination2005 NVARCHAR(500)
,combination2006 NVARCHAR(500)
,combination2007 NVARCHAR(500)
,combination2008 NVARCHAR(500)
,combination2009 NVARCHAR(500)
)
INSERT INTO #results VALUES ('id001','banana',1,3,0,25,4,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','apple',0,7,19,1,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','orange',0,0,0,9,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','strawberry',1,1,1,1,4,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','grapes',0,3,0,0,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','lemon',1,1,1,0,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')

要考虑的关键因素是每个联系人每个水果类型需要一行(因为此表将在其他地方使用),并且如果计数大于,我只想要一个水果使其成为组合类型0

考虑到我正在处理的行数,这可能永远不会非常有效,但是如果有机会我可以将这些信息附加到我的桌子上那将是很棒的:)

已尝试的方法

方法1)

SELECT *
,STUFF(
(SELECT ' ' + fruit_type
FROM #fruit fr2
WHERE fr.contact_id = fr2.contact_id
AND 2005_orders > 0
order by contact_id,fruit_type
FOR XML path ('')
)
,1,1,''
) AS combination
FROM #fruit fr

方法2)

SELECT *
,ISNULL((MAX(CASE WHEN fruit_type = 'banana' AND 2005_orders > 0 THEN 'banana ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'apple' AND 2005_orders > 0 THEN 'apple ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'orange' AND 2005_orders > 0 THEN 'orange' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'strawberry' AND 2005_orders > 0 THEN 'strawberry ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'grapes' AND 2005_orders > 0 THEN 'grapes ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'lemon' AND 2005_orders > 0 THEN 'lemon ' END) OVER (PARTITION BY contact_id)),'')+
AS combination05
FROM #fruit fr

- 然后重复2006 - 2009年(我知道这是非常低效的!)

3 个答案:

答案 0 :(得分:1)

两种方法的性能问题都将是子查询。尝试这种策略将其分开并避免子查询。

如果您保证每个contact_id / fruit_type组合都有记录,则无需使用外连接。

contact_id索引应该可以大大提高性能。

SELECT
 f.*
 , combination2005 = 
     CASE WHEN b.[2005_orders] = 0 OR b.[2005_orders] IS NULL THEN '' ELSE 'banana + ' END
     + CASE WHEN a.[2005_orders] = 0 OR a.[2005_orders] IS NULL THEN '' ELSE 'apple + ' END
     + CASE WHEN o.[2005_orders] = 0 OR o.[2005_orders] IS NULL THEN '' ELSE 'orange + ' END
     + CASE WHEN s.[2005_orders] = 0 OR s.[2005_orders] IS NULL THEN '' ELSE 'strawberry + ' END
  , combination2006 = 
     CASE WHEN b.[2006_orders] = 0 OR b.[2006_orders] IS NULL THEN '' ELSE 'banana + ' END
     + CASE WHEN a.[2006_orders] = 0 OR a.[2006_orders] IS NULL THEN '' ELSE 'apple + ' END
     + CASE WHEN o.[2006_orders] = 0 OR o.[2006_orders] IS NULL THEN '' ELSE 'orange + ' END
     + CASE WHEN s.[2006_orders] = 0 OR s.[2006_orders] IS NULL THEN '' ELSE 'strawberry + ' END
FROM
 #fruit f
LEFT OUTER JOIN
 ( SELECT * FROM #fruit WHERE fruit_type = 'banana' ) b
ON
  f.contact_id = b.contact_id
LEFT OUTER JOIN
 ( SELECT * FROM #fruit WHERE fruit_type = 'apple' ) a
ON
  f.contact_id = a.contact_id
LEFT OUTER JOIN
 ( SELECT * FROM #fruit WHERE fruit_type = 'orange' ) o
ON
  f.contact_id = o.contact_id
LEFT OUTER JOIN
 ( SELECT * FROM #fruit WHERE fruit_type = 'strawberry' ) s
ON
  f.contact_id = s.contact_id

并且不要使用“SELECT *”,我只是在偷懒。

我应该补充一点,如果你不希望每个contact_id都有一个记录每个fruit_type(因此你需要在这里使用外连接),那么case表达式除了零之外还应测试null。 (上面补充说)

答案 1 :(得分:0)

在您的环境中可能不合适/可能,但考虑到这些数字与过去相关,您是否考虑过将每个联系人每年的连接字符串预先生成一个表格,使用您目前拥有的任何方法,并加入那对你的结果?

您可能必须使用预定作业来维护当前年份的字符串,但2005-9的数据似乎不太可能发生变化。

答案 2 :(得分:0)

我认为超时可以追溯到您正在计算的冗余数据量。当我查看您的样本时,似乎会为表中的每一行计算水果列表。我采取了一种可能有所帮助的不同方法。

我使用下面的选择将水果名称放在销售价值的位置。

SELECT 
contact_id
,substring([fruit_type],1,cast(CAST([2005_orders] as bit)as int)*50) AS [2005_fruit]
,substring([fruit_type],1,cast(CAST([2006_orders] as bit)as int)*50) AS [2006_fruit]
,substring([fruit_type],1,cast(CAST([2007_orders] as bit)as int)*50) AS [2007_fruit]
,substring([fruit_type],1,cast(CAST([2008_orders] as bit)as int)*50) AS [2008_fruit]
,substring([fruit_type],1,cast(CAST([2009_orders] as bit)as int)*50) AS [2009_fruit]
from #fruit

contact_id                                         2005_fruit                                         2006_fruit                                         2007_fruit                                         2008_fruit                                         2009_fruit
-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
id001                                              banana                                             banana                                                                                                banana                                             banana
id001                                                                                                 apple                                              apple                                              apple                                              
id001                                                                                                                                                                                                       orange                                             
id001                                              strawberry                                         strawberry                                         strawberry                                         strawberry                                         strawberry
id001                                                                                                 grapes                                                                                                                                                   
id001                                              lemon                                              lemon                                              lemon                                                                                                 

我相信下一步是将所显示的选择的输出转换为连接水果。目标应该是一个可以连接到原始水果桌的桌子。如果需要,可以将表硬化到结果表中,除非有很多冗余数据可以杀死IO。