Question

所以我正在开发一些数据库'去识别'，其中每条信息都会发生变化。在大多数较小的表上，简单的更新并不太耗时（通过10,000行左右的表格。我现在已经转移到大约500,000行的表。

我已经读过，实现这种“更新”的最快方法实际上就是选择更新所需列的临时表。（我在这里阅读。Fastest way to update 120 Million records）

这个问题是OP正在使用单个值更新所有类似值，其中每个值都不同，即他将单个列中的空行更新为-1，我正在更新每个列我的新行或多或少是一个随机的日期;这就是我到目前为止所做的。

--The only Index on Treatments is a Clustered Primary Key (TreatmentID)
SELECT * INTO #Treatments_temp
FROM Treatments
CREATE CLUSTERED INDEX IDX_Treatments ON #Treatments_temp(TreatmentID)

SET @rows = (SELECT TOP 1 TreatmentID
             FROM Treatments
             ORDER BY TreatmentID Desc)

WHILE (@rows > 0)
  BEGIN

    --There are only 500,000 records in this table from count(*) but the PK is much 
    --higher (some records are deleted, made in error ETC so this if statement is my
    --attempt to bypass the code for @rows that don't actually exist.

    IF (SELECT TreatmentID FROM #Treatments_temp WHERE TreatmentID = @rows) IS NOT NULL
      BEGIN
      DECLARE @year INT;
      DECLARE @month INT;
      DECLARE @date INT;
      DECLARE @newStartDate SMALLDATETIME;
      DECLARE @multiplier FLOAT;

      SET @multiplier = (SELECT RAND());

      SET @year = @multiplier * 99 + 1900;
      SET @month = @multiplier * 11 + 1;
      SET @date = @multiplier * 27 + 1;

      SET @newStartDate = DATEADD(MONTH,((@year-1900)*12)+@month-1,@date-1);

      UPDATE #Treatments_temp
      SET StartDate = @newStartDate
      WHERE TreatmentID = @rows

      UPDATE #Treatments_temp
      SET EndDate = DATEADD(MINUTE, @timebetween, @newStartDate)
      WHERE TreatmentID = @rows
      END

  SET @rows = @rows - 1
  END

Answer 1

如果不了解您拥有的内容，我认为最简单的方法是：

将“随机化”逻辑放入标量函数
使用ID和每个ID
使用窄表上的Treatment更新您的INNER JOIN表格以获取新值

不需要逐行的方法。

Answer 2

我认为这应该有效：

-- using NewID() instead of Rand() because Rand() is only interpreted once for the entre query while NewID() is for each record
-- Based on your logic I understand newStartDate had to be between 1 jan 1801 and 28 dec 1999 
DECLARE @multiplier float
DECLARE @max_int    float
DECLARE @daterange  float

SELECT @max_int   = Power(Convert(float, 2), 31), -- signed int !
       @daterange = DateDiff(day, '1 jan 1801', '28 dec 1999')

UPDATE Treatments
   SET @multiplier  = (@max_int - Convert(real, ABS(BINARY_CHECKSUM(NewID())))) / @max_int, -- returns something between 0 and 1
       StartDate    = DateAdd(day, Convert(int, (@daterange * @multiplier)), '1 jan 1801') -- returns somewhere in the daterange

-- test 'spread'
SELECT COUNT(*), COUNT(DISTINCT StartDate), Min(StartDate), Max(StartDate) FROM Treatments

如果有人想测试这个，你可以使用它来生成一些测试数据（@Kulingar：确保不要意外丢弃你的桌子=）

IF DB_ID('test') IS NULL CREATE DATABASE test
GO
USE test
GO
IF Object_ID('test..Treatments') IS NOT NULL DROP TABLE test..Treatments
GO
SELECT row_id = IDENTITY(int, 1, 1), StartDate = CURRENT_TIMESTAMP INTO Treatments FROM sys.columns, sys.objects

Answer 3

我完成了一个小程序的写作：

选择*并放入结构
更改结构中的数据
用新数据替换旧数据（drop table + create table或truncate + insert into可能适用）

通过这种方式，您可以在数据库外部执行逻辑，并将crud限制为必需的。

使用唯一数据在SQL Server中更新500,000行

3 个答案: