使用唯一数据在SQL Server中更新500,000行

时间:2011-11-04 12:12:28

标签: sql performance sql-server-2005 sql-update

所以我正在开发一些数据库'去识别',其中每条信息都会发生变化。在大多数较小的表上,简单的更新并不太耗时(通过10,000行左右的表格。我现在已经转移到大约500,000行的表。

我已经读过,实现这种“更新”的最快方法实际上就是选择更新所需列的临时表。 (我在这里阅读。Fastest way to update 120 Million records

这个问题是OP正在使用单个值更新所有类似值,其中每个值都不同,即他将单个列中的空行更新为-1,我正在更新每个列我的新行或多或少是一个随机的日期;这就是我到目前为止所做的。

--The only Index on Treatments is a Clustered Primary Key (TreatmentID)
SELECT * INTO #Treatments_temp
FROM Treatments
CREATE CLUSTERED INDEX IDX_Treatments ON #Treatments_temp(TreatmentID)

SET @rows = (SELECT TOP 1 TreatmentID
             FROM Treatments
             ORDER BY TreatmentID Desc)

WHILE (@rows > 0)
  BEGIN

    --There are only 500,000 records in this table from count(*) but the PK is much 
    --higher (some records are deleted, made in error ETC so this if statement is my
    --attempt to bypass the code for @rows that don't actually exist.

    IF (SELECT TreatmentID FROM #Treatments_temp WHERE TreatmentID = @rows) IS NOT NULL
      BEGIN
      DECLARE @year INT;
      DECLARE @month INT;
      DECLARE @date INT;
      DECLARE @newStartDate SMALLDATETIME;
      DECLARE @multiplier FLOAT;

      SET @multiplier = (SELECT RAND());

      SET @year = @multiplier * 99 + 1900;
      SET @month = @multiplier * 11 + 1;
      SET @date = @multiplier * 27 + 1;

      SET @newStartDate = DATEADD(MONTH,((@year-1900)*12)+@month-1,@date-1);

      UPDATE #Treatments_temp
      SET StartDate = @newStartDate
      WHERE TreatmentID = @rows

      UPDATE #Treatments_temp
      SET EndDate = DATEADD(MINUTE, @timebetween, @newStartDate)
      WHERE TreatmentID = @rows
      END

  SET @rows = @rows - 1
  END

3 个答案:

答案 0 :(得分:2)

如果不了解您拥有的内容,我认为最简单的方法是:

  • 将“随机化”逻辑放入标量函数
  • 使用ID和每个ID
  • 的函数结果创建一个窄表
  • 使用窄表上的Treatment更新您的INNER JOIN表格以获取新值

不需要逐行的方法。

答案 1 :(得分:1)

我认为这应该有效:

-- using NewID() instead of Rand() because Rand() is only interpreted once for the entre query while NewID() is for each record
-- Based on your logic I understand newStartDate had to be between 1 jan 1801 and 28 dec 1999 
DECLARE @multiplier float
DECLARE @max_int    float
DECLARE @daterange  float

SELECT @max_int   = Power(Convert(float, 2), 31), -- signed int !
       @daterange = DateDiff(day, '1 jan 1801', '28 dec 1999')

UPDATE Treatments
   SET @multiplier  = (@max_int - Convert(real, ABS(BINARY_CHECKSUM(NewID())))) / @max_int, -- returns something between 0 and 1
       StartDate    = DateAdd(day, Convert(int, (@daterange * @multiplier)), '1 jan 1801') -- returns somewhere in the daterange

-- test 'spread'
SELECT COUNT(*), COUNT(DISTINCT StartDate), Min(StartDate), Max(StartDate) FROM Treatments

如果有人想测试这个,你可以使用它来生成一些测试数据(@Kulingar:确保不要意外丢弃你的桌子=)

IF DB_ID('test') IS NULL CREATE DATABASE test
GO
USE test
GO
IF Object_ID('test..Treatments') IS NOT NULL DROP TABLE test..Treatments
GO
SELECT row_id = IDENTITY(int, 1, 1), StartDate = CURRENT_TIMESTAMP INTO Treatments FROM sys.columns, sys.objects

答案 2 :(得分:0)

我完成了一个小程序的写作:

  1. 选择*并放入结构
  2. 更改结构中的数据
  3. 用新数据替换旧数据(drop table + create table或truncate + insert into可能适用)
  4. 通过这种方式,您可以在数据库外部执行逻辑,并将crud限制为必需的。