我正在尝试准备一些数据以供第三方删除,不幸的是,他们只能批量处理2000条记录中的数据。我有100k记录,可能需要多次分割和导出这些数据,所以我想以某种方式自动化这个过程。
使用SQL Server 2008有一种相当简单的方法吗?我没有运行复杂的查询 - 距离SELECT PKID FROM Sometable ORDER BY PKID
并不太远 - 尽管我可以使用光标做到这一点,但我想知道是否有更好的方法。
答案 0 :(得分:4)
我认为您可以利用ROW_NUMBER然后使用BETWEEN来指定您喜欢的行范围。或者,如果您知道没有间隙,或者不关心间隙,则可以使用PKID
e.g。
SELECT ...
FROM
(SELECT ...
ROW_NUMBER() OVER(ORDER BY PKID ) as RowNum
FROM Sometable e
) t
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1
这通常用于分页结果。 4GuysFromRolla有一个很好的article on it
答案 1 :(得分:4)
SET NOCOUNT ON;
CREATE TABLE [dbo].[SyncAudit] ( PkId INT, BatchNumber INT)
DECLARE @batchsize INT
,@rowcount INT
,@batchcount INT
,@rootdir VARCHAR(2048)
,@saveas VARCHAR(2048)
,@query VARCHAR(2048)
,@bcpquery VARCHAR(2048)
,@bcpconn VARCHAR(64)
,@bcpdelim VARCHAR(2)
SET @rootdir = '\\SERVER1\SHARE1\FOLDER\'
SET @batchsize = 2000
SET @bcpdelim = '|'
SET @bcpconn = '-T' -- Trusted
--SET @bcpconn = '-U <username> -P <password>' -- SQL authentication
SELECT @rowcount = COUNT(1),
@batchcount = CEILING(COUNT(1)/@batchsize) FROM <@TableName, string, 'SomeTable'>
SELECT [BatchSize] = @BatchSize, [BatchCount] = @Batchcount
INSERT INTO SyncAudit
SELECT
<@TableKey, string, 'PKField'>
,groupnum = NTILE(@batchcount) OVER ( ORDER BY <@TableKey, string, 'PKField'>)
FROM
<@TableName, string, 'SomeTable'>
WHILE (@batchcount > 0)
BEGIN
SET @saveas = @rootdir + 'batchnumber-' + cast(@batchcount as varchar) + '.txt'
SET @query = ' SELECT [<@TableName, string, 'SomeTable'>].*
FROM [' + db_name() + '].[dbo].[<@TableName, string, 'SomeTable'>]
JOIN [' + db_name() + '].[dbo].[SyncAudit]
ON [<@TableName, string, 'SomeTable'>].<@TableKey, string, 'PKField'> = [SyncAudit].PkId
AND [SyncAudit].BatchNumber = ' + cast(@batchcount as varchar) + ''
SET @bcpquery = 'bcp "' + replace(@query, char(10), '') + '" QUERYOUT "' + @saveas + '" -c -t^' + @bcpdelim + ' ' + @bcpconn + ' -S ' + @@servername
EXEC master..xp_cmdshell @bcpquery
--EXEC (@query)
SET @batchcount = @batchcount -1
END
DROP TABLE [dbo].[SyncAudit] -- or leave for reference
答案 2 :(得分:2)
您可以在@@ ROWCOUNT循环中计算范围以定位所需的行。它可能比ROW_NUMBER()更好,它必须从头开始编号。
declare @startid int
declare @endid int
-- get one range, these are efficient as they go over the PKID key by range
select top(1) @startid = pkid from sometable order by pkid -- 1 key visited
select top(2000) @endid = pkid from sometable order by pkid -- 2000 keys visited
-- note: top 2000 may end up with the 514th id if that is the last one
while @@ROWCOUNT > 0
begin
insert otherdb.dbo.backupcopy
select * from sometable
where pkid between @startid and @endid
select top(1) @startid = pkid from sometable
WHERE pkid > @endid -- binary locate
order by pkid
select top(2000) @endid = pkid from sometable
WHERE pkid > @endid -- binary locate, then forward range lookup, max 2000 keys
order by pkid
end
答案 3 :(得分:0)
我最终使用了 cyberkiwi 和 Adam 提供的方法的组合。我不需要仅使用ROW_NUMBER
,因为我在IDENTITY
数据类型中使用了table
列。
这是我使用的代码的编辑版本 - 它就像一个魅力。再次感谢大家的帮助!
use Testing
GO
SET NOCOUNT ON
declare
@now datetime = GETDATE(),
@batchsize int = 2000,
@bcpTargetDir varchar(500) = '\\SomeServer\Upload\',
@csvQueryServer varchar(500) = '.\SQLExpress',
@rowcount integer,
@nowstring varchar(100),
@batch_id int,
@startid int,
@endid int,
@oidCSV varchar(max),
@csvQuery varchar(max),
@bcpFilename varchar(200),
@bcpQuery varchar(1000)
declare @tblBatchRanges table (
batch_id integer NOT NULL IDENTITY(1,1) PRIMARY KEY,
oid_start integer NOT NULL,
oid_end integer NOT NULL,
csvQuery varchar(max)
)
-- Create a unique timestamp-based string, which will be used to name the exported files.
select @nowstring = CONVERT(varchar, @now, 112) + '-' + REPLACE(CONVERT(varchar, @now, 114), ':', '')
--
select top(1) @startid = oid from Testing..MyObjectIds order by oid
select top(@batchsize) @endid = oid from Testing..MyObjectIds order by oid
select @rowcount = @@ROWCOUNT
while (@rowcount > 0) begin
-- Create a CSV of all object IDs in the batch, using the STUFF() function (http://goo.gl/EyE8L).
select @csvQuery = 'select stuff((select distinct '','' + CAST(oid as varchar) from Testing..MyObjectIds where oid between ' + CAST(@startid as varchar) + ' and ' + CAST(@endid as varchar) + ' order by '','' + CAST(oid as varchar) for xml path('''')),1,1,'''')'
-- Log the info and get the batch ID.
insert into @tblBatchRanges (oid_start, oid_end, csvQuery)
values (@startid, @endid, @oidCSV, @csvQuery)
select @batch_id = @@IDENTITY
-- Advance @startid and @endid so that they point to the next batch
select top(1) @startid = oid
from Testing..MyObjectIds
where oid > @endid
order by oid
select top(@batchsize) @endid = oid
from Testing..MyObjectIds
where oid > @endid
order by oid
select @rowcount = @@ROWCOUNT
-- Export the current batch to a file.
select @bcpFilename = 'MyExport-' + @nowstring + '-' + cast(@batch_id as varchar) + '.txt'
select @bcpQuery = 'bcp "' + @csvQuery + '" QUERYOUT "' + @bcpTargetDir + @bcpFilename + '" -S ' + @csvQueryServer + ' -T -c'
exec master..xp_cmdshell @bcpquery
end
SET NOCOUNT OFF
--Check all of the logged info.
select oid_start, oid_end, csvQuery from @tblBatchRanges