在临时表的UPDATE上加快性能

时间:2015-12-18 23:55:30

标签: sql sql-server tsql

我有一个SQL Server 2012存储过程。我正在填写下面的临时表,这是相当简单的。但是,之后我正在做一些UPDATE

这是我的T-SQL,用于声明临时表#SourceTable,填充它,然后对其进行一些更新。在完成所有这些之后,我只需将此临时表插入到一个新表中,我们将填充MERGE语句,该语句将加入DOIDOI是这里的主要列,您将在下面看到我的UPDATE语句基于此列在多个列上获得MAX/MIN,因为该表可以包含多个具有相同列的行DOI 1}}。

我的问题是......如何加快填写#SourceTable或对其进行更新?我可以创建任何索引吗?我在SQL方面很不错,但在性能问题上并不是最好的。我在临时表中处理了大约60,000,000条记录。现在已经运行了近4个小时。这是我一次运行的脚本的一次性交易。

CREATE TABLE #SourceTable
(
    DOI VARCHAR(72), 
    FullName NVARCHAR(128), LastName NVARCHAR(64), 
    FirstName NVARCHAR(64), FirstInitial NVARCHAR(10), 
    JournalId INT, JournalVolume VARCHAR(16), 
    JournalIssue VARCHAR(16), JournalFirstPage VARCHAR(16), 
    JournalLastPage VARCHAR(16), ArticleTitle NVARCHAR(1024), 
    PubYear SMALLINT, CreatedDate SMALLDATETIME, 
    UpdatedDate SMALLDATETIME, 
    ISSN_e VARCHAR(16), ISSN_p VARCHAR(16), 
    Citations INT, LastCitationRefresh SMALLDATETIME, 
    LastCitationRefreshValue SMALLINT, IsInSearch BIT, 
    BatchUpdatedDate SMALLDATETIME, LastIndexUpdate SMALLDATETIME, 
    ArticleClassificationId INT, ArticleClassificationUpdatedBy INT, 
    ArticleClassificationUpdatedDate SMALLDATETIME, 
    Affiliations VARCHAR(8000),
    --Calculated columns for use in importing...
    RowNum SMALLINT, MinCreatedDatePerDOI SMALLDATETIME, 
    MaxUpdatedDatePerDOI SMALLDATETIME, 
    MaxBatchUpdatedDatePerDOI SMALLDATETIME, 
    MaxArticleClassificationUpdatedByPerDOI INT, 
    MaxArticleClassificationUpdatedDatePerDOI SMALLDATETIME, 
    AffiliationsSameForAllDOI BIT, NewArticleId INT
)

--***************************************
--CROSSREF_ARTICLES
--***************************************
--GET RAW DATA INTO SOURCE TABLE TEMP TABLE..
INSERT INTO #SourceTable 
    SELECT 
        DOI, FullName, LastName, FirstName, FirstInitial, 
        JournalId, LEFT(JournalVolume,16) AS JournalVolume, 
        LEFT(JournalIssue,16) AS JournalIssue, 
        LEFT(JournalFirstPage,16) AS JournalFirstPage, 
        LEFT(JournalLastPage,16) AS JournalLastPage, 
        ArticleTitle, PubYear, CreatedDate, UpdatedDate, 
        ISSN_e, ISSN_p, 
        ISNULL(Citations,0) AS Citations, LastCitationRefresh, 
        LastCitationRefreshValue, IsInSearch, BatchUpdatedDate, 
        LastIndexUpdate, ArticleClassificationId, 
        ArticleClassificationUpdatedBy, 
        ArticleClassificationUpdatedDate, Affiliations,
        ROW_NUMBER() OVER(PARTITION BY DOI ORDER BY UpdatedDate DESC, CreatedDate ASC) AS RowNum, 
        NULL AS MinCreatedDatePerDOI, NULL AS MaxUpdatedDatePerDOI, 
        NULL AS MaxBatchUpdatedDatePerDOI, 
        NULL AS MaxArticleClassificationUpdatedByPerDOI, 
        NULL AS ArticleClassificationUpdatedDatePerDOI, 
        0 AS AffiliationsSameForAllDOI, NULL AS NewArticleId
    FROM 
        CrossRef_Articles WITH (NOLOCK)

--UPDATE SOURCETABLE WITH MAX/MIN/CALCULATED VALUES PER DOI...
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI, MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN (SELECT MAX(UpdatedDate) AS MaxUpdatedDatePerDOI, MIN(CreatedDate) AS MinCreatedDatePerDOI, MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, DOI from #SourceTable GROUP BY DOI) AS T ON S.DOI = T.DOI
    UPDATE S
        SET AffiliationsSameForAllDOI = 1
        FROM #SourceTable S
        WHERE NOT EXISTS (SELECT 1 FROM #SourceTable S2 WHERE S2.DOI = S.DOI AND S2.Affiliations <> S.Affiliations)

3 个答案:

答案 0 :(得分:0)

这可能是一种更快速的更新方式 - 很难说没有看到执行计划,但它可能会为每一行运行GROUP BY。

with doigrouped AS
(
  SELECT
    MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
    MIN(CreatedDate) AS MinCreatedDatePerDOI,
    MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, 
    MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, 
    MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, 
    DOI 
  FROM #SourceTable 
  GROUP BY DOI
)
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
    MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, 
    MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, 
    MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, 
    MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI

如果它更快,它会快几个数量级 - 但这并不意味着你的机器可以在任何时间段内处理6000万条记录......如果你没有在100k上测试首先,没有办法知道完成需要多长时间。

答案 1 :(得分:0)

我想你可以试试:

  1. <!DOCTYPE html> <html> <head> <title>Device Properties Example</title> <script type="text/javascript" charset="utf-8" src="cordova.js"></script> <script type="text/javascript" charset="utf-8"> document.addEventListener("deviceready", onDeviceReady, false); function onDeviceReady() { navigator.geolocation.getCurrentPosition(onSuccess, onError); } function onSuccess(position) { var element = document.getElementById('geolocation'); element.innerHTML = 'Latitude: ' + position.coords.latitude + '<br />' + 'Longitude: ' + position.coords.longitude + '<br />' + 'Altitude: ' + position.coords.altitude + '<br />' + 'Accuracy: ' + position.coords.accuracy + '<br />' + 'Altitude Accuracy: ' + position.coords.altitudeAccuracy + '<br />' + 'Heading: ' + position.coords.heading + '<br />' + 'Speed: ' + position.coords.speed + '<br />' + 'Timestamp: ' + position.timestamp + '<br />'; } function onError(error) { alert('code: ' + error.code + '\n' + 'message: ' + error.message + '\n'); } </script> </head> <body> <p id="geolocation">Finding geolocation...</p> </body> </html>替换为INSERT
  2. 无论如何,你的#SourceTable上没有索引。 SELECT INTO记录最少,因此您必须加速

    1. SELECT INTO替换为UPDATE另一张表
    2. 您可以使用SELECT INTO(修改后的Hogan查询)创建#SourceTable_Updates,而不是更新#SourceTable:

      SELECT INTO
      1. 使用with doigrouped AS ( SELECT MAX(UpdatedDate) AS MaxUpdatedDatePerDOI, MIN(CreatedDate) AS MinCreatedDatePerDOI, MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, DOI FROM #SourceTable GROUP BY DOI ) SELECT S.DOI, MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI, MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI INTO #SourceTable_Updates FROM #SourceTable S INNER JOIN doigrouped T ON S.DOI = T.DOI - ed #SourceTable和#SourceTable_Updates
      2. 希望这有帮助

答案 2 :(得分:0)

以下是一些可能有助于您插入语句

的表现的事情
  • CrossRef_Articles表是否有主键?如果它确实将主键(确保它已编入索引)插入临时表中,并且只包含进行计算所需的字段。计算完成后,执行选择并将临时表连接到Id字段上的原始表。将所有数据写入磁盘需要时间。
  • 看看你的tempdb。如果您多次运行此查询,则数据库或日志文件大小可能会失控。
  • 检查已加入的2个原始表之间的字段,以查看字段是否已编入索引?