我用SQL创建了一个过程,该过程采用了大数据集并输出了大XML文件。由于内存问题,我试图对其进行调整,以使其在较小的子集上运行,然后将其添加到其他位置。我添加了名为“当前”的子查询,该子查询将数据过滤到单个IID实体,然后在该实体上运行相同的代码,但是由于某种原因,它花费的时间比我预期的长得多。
WITH
current AS (SELECT * FROM xml.MainDataset WHERE IID = '27'),
sources AS (SELECT DISTINCT stage, IID, name FROM current ),
pop AS (SELECT DISTINCT stage, IID, pop, type, plat FROM current ),
Singles AS (SELECT DISTINCT stage, IID, pop, type, plat, stype, split1, split2 FROM current Where split2 = 'N/A'),
Measures AS (SELECT DISTINCT stage, IID, pop, type, plat, stype, split1, split2, mType FROM current)
--SELECT stage,
SELECT IID as ukprn, name,
(SELECT pop as pop_category, type, plat,
-- SINGLES
(SELECT stype as sCat, split1 as attribute,
(SELECT mType,
(SELECT DISTINCT mDetail, stream1, stream2, stream3, stream4, stream5,
FROM current as da
WHERE da.stage = mT.stage
AND da.IID = mT.IID
AND da.pop = mT.pop
AND da.type = mT.type
AND da.plat = mT.plat
AND da.stype = mT.stype
AND da.split1 = mT.split1
AND da.split2 = mT.split2
AND da.mType = mT.mType
FOR XML PATH ('data'), TYPE) [*]
FROM Measures as mT
WHERE mT.stage = sST.stage
AND mT.IID = sST.IID
AND mT.pop = sST.pop
AND mT.type = sST.type
AND mT.plat = sST.plat
AND mT.stype = sST.stype
AND mT.split1 = sST.split1
--AND mT.split2 = sST.split2
AND mT.split2 = 'N/A'
FOR XML PATH ('measures'), TYPE) [*]
FROM Singles as sST
WHERE sST.stage = Po.stage
AND sST.IID = Po.IID
AND sST.pop = Po.pop
AND sST.type = Po.type
AND sST.plat = Po.plat
FOR XML PATH ('singles'), TYPE) [*],
FROM pop as Po
WHERE Po.stage = Pr.stage
AND Po.IID = '27'
FOR XML PATH ('pop'), TYPE) [*]
FROM Sources as Pr
WHERE Pr.stage = 'Primary'
FOR XML PATH ('source'),
root('ReleaseData')
数据集(xml.MainDataset)非常大,并且具有数百个唯一的IID值,但是,如果将查询分为两部分,它们将非常快:将主数据集过滤到单个IID需要一秒钟。在仅包含一个IID的数据集上运行查询只需要一秒钟。
这是怎么回事?是否会多次创建“当前”数据集?