我有一个SQL Server 2012存储过程。我正在填写下面的临时表,这是相当简单的。但是,之后我正在做一些UPDATE
。
这是我的T-SQL,用于声明临时表#SourceTable
,填充它,然后对其进行一些更新。在完成所有这些之后,我只需将此临时表插入到一个新表中,我们将填充MERGE
语句,该语句将加入DOI
。 DOI
是这里的主要列,您将在下面看到我的UPDATE
语句基于此列在多个列上获得MAX/MIN
,因为该表可以包含多个具有相同列的行DOI
1}}。
我的问题是......如何加快填写#SourceTable
或对其进行更新?我可以创建任何索引吗?我在SQL方面很不错,但在性能问题上并不是最好的。我在临时表中处理了大约60,000,000条记录。现在已经运行了近4个小时。这是我一次运行的脚本的一次性交易。
CREATE TABLE #SourceTable
(
DOI VARCHAR(72),
FullName NVARCHAR(128), LastName NVARCHAR(64),
FirstName NVARCHAR(64), FirstInitial NVARCHAR(10),
JournalId INT, JournalVolume VARCHAR(16),
JournalIssue VARCHAR(16), JournalFirstPage VARCHAR(16),
JournalLastPage VARCHAR(16), ArticleTitle NVARCHAR(1024),
PubYear SMALLINT, CreatedDate SMALLDATETIME,
UpdatedDate SMALLDATETIME,
ISSN_e VARCHAR(16), ISSN_p VARCHAR(16),
Citations INT, LastCitationRefresh SMALLDATETIME,
LastCitationRefreshValue SMALLINT, IsInSearch BIT,
BatchUpdatedDate SMALLDATETIME, LastIndexUpdate SMALLDATETIME,
ArticleClassificationId INT, ArticleClassificationUpdatedBy INT,
ArticleClassificationUpdatedDate SMALLDATETIME,
Affiliations VARCHAR(8000),
--Calculated columns for use in importing...
RowNum SMALLINT, MinCreatedDatePerDOI SMALLDATETIME,
MaxUpdatedDatePerDOI SMALLDATETIME,
MaxBatchUpdatedDatePerDOI SMALLDATETIME,
MaxArticleClassificationUpdatedByPerDOI INT,
MaxArticleClassificationUpdatedDatePerDOI SMALLDATETIME,
AffiliationsSameForAllDOI BIT, NewArticleId INT
)
--***************************************
--CROSSREF_ARTICLES
--***************************************
--GET RAW DATA INTO SOURCE TABLE TEMP TABLE..
INSERT INTO #SourceTable
SELECT
DOI, FullName, LastName, FirstName, FirstInitial,
JournalId, LEFT(JournalVolume,16) AS JournalVolume,
LEFT(JournalIssue,16) AS JournalIssue,
LEFT(JournalFirstPage,16) AS JournalFirstPage,
LEFT(JournalLastPage,16) AS JournalLastPage,
ArticleTitle, PubYear, CreatedDate, UpdatedDate,
ISSN_e, ISSN_p,
ISNULL(Citations,0) AS Citations, LastCitationRefresh,
LastCitationRefreshValue, IsInSearch, BatchUpdatedDate,
LastIndexUpdate, ArticleClassificationId,
ArticleClassificationUpdatedBy,
ArticleClassificationUpdatedDate, Affiliations,
ROW_NUMBER() OVER(PARTITION BY DOI ORDER BY UpdatedDate DESC, CreatedDate ASC) AS RowNum,
NULL AS MinCreatedDatePerDOI, NULL AS MaxUpdatedDatePerDOI,
NULL AS MaxBatchUpdatedDatePerDOI,
NULL AS MaxArticleClassificationUpdatedByPerDOI,
NULL AS ArticleClassificationUpdatedDatePerDOI,
0 AS AffiliationsSameForAllDOI, NULL AS NewArticleId
FROM
CrossRef_Articles WITH (NOLOCK)
--UPDATE SOURCETABLE WITH MAX/MIN/CALCULATED VALUES PER DOI...
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI, MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN (SELECT MAX(UpdatedDate) AS MaxUpdatedDatePerDOI, MIN(CreatedDate) AS MinCreatedDatePerDOI, MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, DOI from #SourceTable GROUP BY DOI) AS T ON S.DOI = T.DOI
UPDATE S
SET AffiliationsSameForAllDOI = 1
FROM #SourceTable S
WHERE NOT EXISTS (SELECT 1 FROM #SourceTable S2 WHERE S2.DOI = S.DOI AND S2.Affiliations <> S.Affiliations)
在
答案 0 :(得分:0)
这可能是一种更快速的更新方式 - 很难说没有看到执行计划,但它可能会为每一行运行GROUP BY。
with doigrouped AS
(
SELECT
MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
MIN(CreatedDate) AS MinCreatedDatePerDOI,
MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI,
MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI,
MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI,
DOI
FROM #SourceTable
GROUP BY DOI
)
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI,
MinCreatedDatePerDOI = T.MinCreatedDatePerDOI,
MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI,
MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
如果它更快,它会快几个数量级 - 但这并不意味着你的机器可以在任何时间段内处理6000万条记录......如果你没有在100k上测试首先,没有办法知道完成需要多长时间。
答案 1 :(得分:0)
我想你可以试试:
<!DOCTYPE html>
<html>
<head>
<title>Device Properties Example</title>
<script type="text/javascript" charset="utf-8" src="cordova.js"></script>
<script type="text/javascript" charset="utf-8">
document.addEventListener("deviceready", onDeviceReady, false);
function onDeviceReady() {
navigator.geolocation.getCurrentPosition(onSuccess, onError);
}
function onSuccess(position) {
var element = document.getElementById('geolocation');
element.innerHTML = 'Latitude: ' + position.coords.latitude + '<br />' +
'Longitude: ' + position.coords.longitude + '<br />' +
'Altitude: ' + position.coords.altitude + '<br />' +
'Accuracy: ' + position.coords.accuracy + '<br />' +
'Altitude Accuracy: ' + position.coords.altitudeAccuracy + '<br />' +
'Heading: ' + position.coords.heading + '<br />' +
'Speed: ' + position.coords.speed + '<br />' +
'Timestamp: ' + position.timestamp + '<br />';
}
function onError(error) {
alert('code: ' + error.code + '\n' +
'message: ' + error.message + '\n');
}
</script>
</head>
<body>
<p id="geolocation">Finding geolocation...</p>
</body>
</html>
替换为INSERT
无论如何,你的#SourceTable上没有索引。
SELECT INTO
记录最少,因此您必须加速
SELECT INTO
替换为UPDATE
另一张表您可以使用SELECT INTO
(修改后的Hogan查询)创建#SourceTable_Updates,而不是更新#SourceTable:
SELECT INTO
with doigrouped AS
(
SELECT
MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
MIN(CreatedDate) AS MinCreatedDatePerDOI,
MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI,
MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI,
MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI,
DOI
FROM #SourceTable
GROUP BY DOI
)
SELECT
S.DOI,
MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI,
MinCreatedDatePerDOI = T.MinCreatedDatePerDOI,
MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI,
MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
INTO #SourceTable_Updates
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
- ed #SourceTable和#SourceTable_Updates 希望这有帮助
答案 2 :(得分:0)
以下是一些可能有助于您插入语句
的表现的事情CrossRef_Articles
表是否有主键?如果它确实将主键(确保它已编入索引)插入临时表中,并且只包含进行计算所需的字段。计算完成后,执行选择并将临时表连接到Id字段上的原始表。将所有数据写入磁盘需要时间。