我有一个表,其中的列包含这样的xml:
<block>
<blockIn>
<G>1</G>
</blockIn>
.....
<blockIn>
<G>12</G>
</blockIn>
......
</block>
.....
<block>
......
</block>
我需要在每个<blockIn><G>
之间找到一个MAX,然后总结所有这些MAX
(sum (Max (<block> …<blockIn> ...<G></G>); Max (<block> …<blockIn> ...<G></G>) ...))
所以,我这样做了:
WITH ds AS
(
SELECT
fieldXML
FROM
table
WHERE
ID = 1
)
SELECT
(SELECT SUM(node_a.value('max(blockIn/G)' , 'int' ))
FROM ds.fieldXML.nodes('/Block') AS node_refs(node_a)) AS [ArticulNum]
FROM
ds
但是它工作非常缓慢。
如果我使用一个变量,它会非常快地工作:
DECLARE @xml AS [XML];
SELECT
@xml = fieldXML
FROM
table
WHERE
ID = 1;
SELECT SUM(node_a.value('max(blockIn/G)' , 'INT' ))
FROM @xml.fieldXML.nodes('/Block') AS node_refs(node_a)
我需要怎么做才能使第一个解决方案也能快速运行?
答案 0 :(得分:0)
用户定义函数(UDF)会有所帮助,但它必须是正确的UDF类型,并且,如果性能很重要,那么 它必须是内联函数 。这是原始文档的清理版本(请注意,不需要最后的SUM
):
-- Original
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_original](@xml XML)
RETURNS INT
AS
BEGIN
RETURN
(
SELECT node_a.value('max(blockIn/G)' , 'int' )
FROM @xml.nodes('/block') AS node_refs(node_a)
);
END;
GO
这是一种改进的标量UDF,它将表现得更好。请注意不同的上下文block/blockIn
和text()
节点的使用。
-- Improved scalar UDF:
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_V2](@xml XML)
RETURNS INT
AS
BEGIN
RETURN
(
SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
FROM @xml.nodes('/block/blockIn') AS node_refs(node_a)
);
END;
GO
这将执行得更好,但仍然存在一个基本问题:该函数不是内联的。让我们采用上面的逻辑来创建内联表值函数(iTVF):
-- INLINE UDF
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML_itvf](@xml XML)
RETURNS TABLE AS RETURN
SELECT Mx = MAX(node_a.value('(G/text())[1]','int'))
FROM @xml.nodes('/block/blockIn') AS node_refs(node_a);
GO
接下来是用于性能测试的示例xml数据生成器。这段代码将创建一个具有20K随机XML值的表:
IF OBJECT_ID('tempdb..#yourtable') IS NOT NULL DROP TABLE #yourtable;
SELECT TOP (20000)
SomeId = IDENTITY(INT,1,1),
xmldata = CAST(f.X AS XML),
blob = CAST(CAST(f.X AS VARBINARY(MAX)) AS image)
INTO #yourtable
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(X) -- 10
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS b(X) -- 100
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS c(X) -- 1K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS d(X) -- 10K
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e(X) -- 100K
CROSS JOIN (VALUES (NEWID())) AS n(Id)
CROSS APPLY
(
SELECT TOP(ABS(CHECKSUM(NEWID())%5)+b.X)
G = ABS(CHECKSUM(n.Id)%30)+c.X+ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(x)
ORDER BY NEWID()
FOR XML PATH('blockIn'), ROOT('block')
) AS f(x);
下一步进行快速检查。下面的查询将返回相同的结果:
-- Sanity Check (all 3 return the same results)
SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_original(xmldata)
FROM #yourtable AS t
ORDER BY t.SomeId;
SELECT TOP (10) t.SomeId, Mx = dbo.ArticulNumFromXML_V2(xmldata)
FROM #yourtable AS t
ORDER BY t.SomeId;
SELECT TOP (10) t.SomeId, f.Mx
FROM #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f
ORDER BY t.SomeId;
现在我们知道我们得到了正确的结果集,让我们进行一些性能测试。我注意到,在您的回答中,您首先要转换XML数据。这很贵。在第一个测试中,我正在执行相同类型的转换:
-- Test #1: Blob data
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_original(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_V2(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML))
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Inline Version:'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = f.Mx
FROM #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(CAST(CAST(t.blob AS VARBINARY(MAX)) AS XML)) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
结果:
Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
4560
4000
4346
Batch execution completed 3 times.
Scalar Version (V2 - leveraging the text() node):
------------------------------------------------------------------------------------------
Beginning execution loop
2503
2840
2796
Batch execution completed 3 times.
Inline Version:
------------------------------------------------------------------------------------------
Beginning execution loop
586
670
630
Batch execution completed 3 times.
如您所见:第一项改进使速度提高了50%以上,但是将函数更改为内联表值函数使改进的查询快了 5-6倍,快了近10倍比您的原始功能。
现在让我们跳过代价高昂的XML转换(可以通过使用计算列或索引视图进行预处理来处理。这是第二个测试:
-- Test #2: No XML Conversion
PRINT CHAR(13)+'Scalar Version (original):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_original(xmldata)
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Scalar Version (V2 - leveraging the text() node):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = dbo.ArticulNumFromXML_V2(xmldata)
FROM #yourtable AS t;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
PRINT CHAR(13)+'Inline Version (No hints - Parallel):'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = getdate(), @Mx INT;
SELECT @Mx = f.Mx
FROM #yourtable AS t
CROSS APPLY dbo.ArticulNumFromXML_itvf(xmldata) AS f;
PRINT DATEDIFF(MS,@st,getdate());
GO 3
结果:
Scalar Version (original):
------------------------------------------------------------------------------------------
Beginning execution loop
2933
2633
2953
Batch execution completed 3 times.
Scalar Version:
------------------------------------------------------------------------------------------
Beginning execution loop
826
876
970
Batch execution completed 3 times.
Inline Version (No hints - Parallel):
------------------------------------------------------------------------------------------
Beginning execution loop
63
70
63
Batch execution completed 3 times.
B!读取预转换的XML可以显着减少这三个时间,对于iTVF来说更是如此,它现在比原始功能快40至50倍。
答案 1 :(得分:-1)
决定,制定职能:
CREATE OR ALTER FUNCTION [dbo].[ArticulNumFromXML](@xml XML)
RETURNS INT
AS
BEGIN
RETURN (SELECT
SUM(node_a.value('max(blockIn/G)' , 'int' ))
FROM
@xml.nodes('/BLOCK') AS node_refs(node_a)
);
END;
GO
并带有它,正常:
SELECT
[dbo].[ArticulNumFromXML](CAST(CAST(blob AS VARBINARY(max)) AS XML))
FROM
table
WHERE
ID = 1