削减数据库大小

时间:2011-04-12 15:21:52

标签: sql database

假设您拥有一家为公司服务10年的数据库。它的大小为500GB,它有无数的表,存储过程和触发器。

现在假设您希望创建一个数据库的简化版本,以用作集成测试中使用的测试平台,以及个别测试人员和开发人员可以启动实例。

从广义上讲,您将如何设定此任务?

如果重要,我想到的数据库是SQL Server 2008。

编辑:删除了“单元测试”,因为单元测试当然不应该测试数据库集成

6 个答案:

答案 0 :(得分:3)

查看事务日志文件怎么样? 确保备份原始数据库。

USE db;
GO
-- Truncate the log by changing the database recovery model to SIMPLE.
ALTER DATABASE db
SET RECOVERY SIMPLE;
GO
-- Shrink the truncated log file to 1 MB.
DBCC SHRINKFILE (db_log, 1);
GO
-- Reset the database recovery model.
ALTER DATABASE db
SET RECOVERY FULL;
GO

我还在重建索引碎片整理方面取得了巨大成功。

Tara Kizer发布此消息已经证明可以帮助我们解决数据库性能问题: 感谢Tara Kizer,如果你读到这个!

-- required table
IF OBJECT_ID('DefragmentIndexes') IS NULL
    CREATE TABLE DefragmentIndexes
    (
        DatabaseName nvarchar(100) NOT NULL,
        SchemaName nvarchar(100) NOT NULL,
        TableName nvarchar(100) NOT NULL,
        IndexName nvarchar(100) NOT NULL,
        DefragmentDate datetime NOT NULL,
        PercentFragmented decimal(4, 2) NOT NULL,
        CONSTRAINT PK_DefragmentIndexes PRIMARY KEY CLUSTERED 
        (
            DatabaseName,
            SchemaName,
            TableName,
            IndexName,
            DefragmentDate
        )
    )
GO

IF OBJECT_ID(N'[dbo].[isp_ALTER_INDEX]') IS NOT NULL
    DROP PROC [dbo].[isp_ALTER_INDEX]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------------------------------
-- OBJECT NAME           : isp_ALTER_INDEX
--
-- AUTHOR                : Tara Kizer
--
-- INPUTS                : @dbName         - name of the database
--                         @statsMode      - LIMITED, SAMPLED or DETAILED
--                         @defragType     - REORGANIZE (INDEXDEFRAG) or REBUILD (DBREINDEX)
--                         @minFragPercent - minimum fragmentation level
--                         @maxFragPercent - maximum fragmentation level
--                         @minRowCount    - minimum row count
--                         @logHistory     - whether or not to log what got defragmented
--                         @sortInTempdb   - whether or not to sort the index in tempdb;
--                                           recommended if your tempdb is optimized (see BOL for details)
--
-- OUTPUTS               : None
--
-- DEPENDENCIES          : DefragmentIndexes, sys.dm_db_index_physical_stats, sys.objects, sys.schemas, 
--                         sys.indexes, sys.partitions, sys.indexes, sys.index_columns, INFORMATION_SCHEMA.COLUMNS
--
-- DESCRIPTION           : Defragments indexes
/*
                           EXEC isp_ALTER_INDEX 
                             @dbName = 'QHOSClient1', 
                             @statsMode = 'SAMPLED', 
                             @defragType = 'REBUILD', 
                             @minFragPercent = 10,
                             @maxFragPercent = 100,
                             @minRowCount = 1000,
                             @logHistory = 1,
                             @sortInTempdb = 1
*/
/*
    http://weblogs.sqlteam.com/tarad/archive/2009/03/27/Defragmenting-Indexes-in-SQL-Server-2005Again.aspx
        Bug Fix - added SET QUOTED_IDENTIFIER ON to the script
        Feature - added logging feature
    http://weblogs.sqlteam.com/tarad/archive/2009/06/23/DefragmentingRebuilding-Indexes-in-SQL-Server-2005.aspx
        Bug Fix - initialized @lobData to 0 for each pass through the loop
        Bug Fix - checked for LOB data in included columns of non-clustered indexes
        Feature - added SORT_IN_TEMPB option
    http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx
        Bug Fix - added index_level = 0 to sys.dm_db_index_physical_stats query
    http://weblogs.sqlteam.com/tarad/archive/2009/11/03/DefragmentingRebuilding-Indexes-in-SQL-Server-2005-and-2008Again.aspx
        Bug Fix - for SQL Server 2008, @indexType could be 'XML INDEX' or 'PRIMARY XML INDEX' for XML indexes
*/ 
-------------------------------------------------------------------------------------------
CREATE PROC [dbo].[isp_ALTER_INDEX]
(
    @dbName sysname, 
    @statsMode varchar(8) = 'SAMPLED', 
    @defragType varchar(10) = 'REORGANIZE', 
    @minFragPercent int = 25, 
    @maxFragPercent int = 100, 
    @minRowCount int = 0,
    @logHistory bit = 0,
    @sortInTempdb bit = 0
)
AS

SET NOCOUNT ON

IF @statsMode NOT IN ('LIMITED', 'SAMPLED', 'DETAILED')
BEGIN
    RAISERROR('@statsMode must be LIMITED, SAMPLED or DETAILED', 16, 1)
    RETURN
END

IF @defragType NOT IN ('REORGANIZE', 'REBUILD')
BEGIN
    RAISERROR('@defragType must be REORGANIZE or REBUILD', 16, 1)
    RETURN
END

DECLARE 
    @i int, @objectId int, @objectName sysname, @indexId int, @indexName sysname, 
    @schemaName sysname, @partitionNumber int, @partitionCount int,
    @sql nvarchar(4000), @edition int, @parmDef nvarchar(500), @allocUnitType nvarchar(60),
    @indexType nvarchar(60), @online bit, @disabled bit, @dataType nvarchar(128),
    @charMaxLen int, @allowPageLocks bit, @lobData bit, @fragPercent float

SELECT @edition = CONVERT(int, SERVERPROPERTY('EngineEdition'))

SELECT 
    IDENTITY(int, 1, 1) AS FragIndexId, 
    [object_id] AS ObjectId, 
    index_id AS IndexId, 
    avg_fragmentation_in_percent AS FragPercent, 
    record_count AS RecordCount, 
    partition_number AS PartitionNumber,
    index_type_desc AS IndexType,
    alloc_unit_type_desc AS AllocUnitType
INTO #FragIndex
FROM sys.dm_db_index_physical_stats (DB_ID(@dbName), NULL, NULL, NULL, @statsMode)
WHERE 
    avg_fragmentation_in_percent > @minFragPercent AND 
    avg_fragmentation_in_percent < @maxFragPercent AND 
    index_id > 0 AND
    index_level = 0
ORDER BY ObjectId

-- LIMITED does not include data for record_count
IF @statsMode IN ('SAMPLED', 'DETAILED')
    DELETE FROM #FragIndex
    WHERE RecordCount < @minRowCount

SELECT @i = MIN(FragIndexId) 
FROM #FragIndex

SELECT 
    @objectId = ObjectId, 
    @indexId = IndexId, 
    @fragPercent = FragPercent,
    @partitionNumber = PartitionNumber,
    @indexType = IndexType,
    @allocUnitType = AllocUnitType
FROM #FragIndex
WHERE FragIndexId = @i

WHILE @@ROWCOUNT <> 0
BEGIN
    -- get the table and schema names for the index
    SET @sql = '
        SELECT @objectName = o.[name], @schemaName = s.[name]
        FROM ' + QUOTENAME(@dbName) + '.sys.objects o
        JOIN ' + QUOTENAME(@dbName) + '.sys.schemas s 
            ON s.schema_id = o.schema_id
        WHERE o.[object_id] = @objectId'

    SET @parmDef = N'@objectId int, @objectName sysname OUTPUT, @schemaName sysname OUTPUT'

    EXEC sp_executesql 
        @sql, @parmDef, @objectId = @objectId, 
        @objectName = @objectName OUTPUT, @schemaName = @schemaName OUTPUT

    -- get index information
    SET @sql = '
        SELECT @indexName = [name], @disabled = is_disabled, @allowPageLocks = allow_page_locks
        FROM ' + QUOTENAME(@dbName) + '.sys.indexes
        WHERE [object_id] = @objectId AND index_id = @indexId'

    SET @parmDef = N'
            @objectId int, @indexId int, @indexName sysname OUTPUT, 
            @disabled bit OUTPUT, @allowPageLocks bit OUTPUT'

    EXEC sp_executesql 
        @sql, @parmDef, @objectId = @objectId, @indexId = @indexId, 
        @indexName = @indexName OUTPUT, @disabled = @disabled OUTPUT, 
        @allowPageLocks = @allowPageLocks OUTPUT

    SET @lobData = 0

    -- for clustered indexes, check for columns in the table that use a LOB data type
    IF @indexType = 'CLUSTERED INDEX'
    BEGIN
        -- CHARACTER_MAXIMUM_LENGTH column will equal -1 for max size or xml
        SET @sql = '
            SELECT @lobData = 1
            FROM ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c
            WHERE   TABLE_SCHEMA = @schemaName AND
                    TABLE_NAME = @objectName AND
                    (DATA_TYPE IN (''text'', ''ntext'', ''image'') OR 
                    CHARACTER_MAXIMUM_LENGTH = -1)'

        SET @parmDef = N'@schemaName sysname, @objectName sysname, @lobData bit OUTPUT'

        EXEC sp_executesql 
            @sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName, 
            @lobData = @lobData OUTPUT
    END
    -- for non-clustered indexes, check for LOB data type in the included columns
    ELSE IF @indexType = 'NONCLUSTERED INDEX'
    BEGIN
        SET @sql = '
            SELECT @lobData = 1
            FROM ' + QUOTENAME(@dbName) + '.sys.indexes i
            JOIN ' + QUOTENAME(@dbName) + '.sys.index_columns ic
                ON i.object_id = ic.object_id
            JOIN ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c
                ON ic.column_id = c.ORDINAL_POSITION
            WHERE   c.TABLE_SCHEMA = @schemaName AND
                    c.TABLE_NAME = @objectName AND
                    i.name = @indexName AND
                    ic.is_included_column = 1 AND
                    (c.DATA_TYPE IN (''text'', ''ntext'', ''image'') OR c.CHARACTER_MAXIMUM_LENGTH = -1)'

        SET @parmDef = N'@schemaName sysname, @objectName sysname, @indexName sysname, @lobData bit OUTPUT'

        EXEC sp_executesql 
            @sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName, 
            @indexName = @indexName, @lobData = @lobData OUTPUT
    END

    -- get partition information for the index
    SET @sql = '
        SELECT @partitionCount = COUNT(*)
        FROM ' + QUOTENAME(@dbName) + '.sys.partitions
        WHERE [object_id] = @objectId AND index_id = @indexId'

    SET @parmDef = N'@objectId int, @indexId int, @partitionCount int OUTPUT'

    EXEC sp_executesql 
        @sql, @parmDef, @objectId = @objectId, @indexId = @indexId, 
        @partitionCount = @partitionCount OUTPUT

    -- Developer and Enterprise have the ONLINE = ON option for REBUILD.
    -- Indexes, including indexes on global temp tables, can be rebuilt online with the following exceptions:
    -- disabled indexes, XML indexes, indexes on local temp tables, partitioned indexes,
    -- clustered indexes if the underlying table contains LOB data types (text, ntext, image, varchar(max), 
    -- nvarchar(max), varbinary(max) or xml), and
    -- nonclustered indexes that are defined with LOB data type columns.
    -- When reoganizing and page locks is disabled for the index, we'll switch to rebuild later on, 
    -- so we need to get setup with the proper online option.
    IF @edition = 3 AND (@defragType = 'REBUILD' OR (@defragType = 'REORGANIZE' AND @allowPageLocks = 0))
    BEGIN
        SET @online = 
                CASE
                    WHEN @indexType IN ('XML INDEX', 'PRIMARY XML INDEX') THEN 0
                    WHEN @indexType = 'NONCLUSTERED INDEX' AND @allocUnitType = 'LOB_DATA' THEN 0
                    WHEN @lobData = 1 THEN 0
                    WHEN @disabled = 1 THEN 0
                    WHEN @partitionCount > 1 THEN 0
                    ELSE 1
                END
    END
    ELSE
        SET @online = 0

    -- build the ALTER INDEX statement
    SET @sql = 'ALTER INDEX ' + QUOTENAME(@indexName) + ' ON ' + QUOTENAME(@dbName) + '.' + 
        QUOTENAME(@schemaName) + '.' + QUOTENAME(@objectName) + 
        CASE
            WHEN @defragType = ' REORGANIZE' AND @allowPageLocks = 0 THEN ' REBUILD'
            ELSE ' ' + @defragType
        END

    -- WITH options
    IF @online = 1 OR @sortInTempdb = 1
    BEGIN   
        SET @sql = @sql + ' WITH (' + 
            CASE
                WHEN @online = 1 AND @sortInTempdb = 1 THEN 'ONLINE = ON, SORT_IN_TEMPDB = ON'
                WHEN @online = 1 AND @sortInTempdb = 0 THEN 'ONLINE = ON'
                WHEN @online = 0 AND @sortInTempdb = 1 THEN 'SORT_IN_TEMPDB = ON'
            END + ')'
    END

    IF @partitionCount > 1 AND @disabled = 0 AND @indexType <> 'XML INDEX'
        SET @sql = @sql + ' PARTITION = ' + CAST(@partitionNumber AS varchar(10))

    -- run the ALTER INDEX statement
    EXEC (@SQL)

    -- log some information into a history table
    IF @logHistory = 1
        INSERT INTO DefragmentIndexes (DatabaseName, SchemaName, TableName, IndexName, DefragmentDate, PercentFragmented)
        VALUES(@dbName, @schemaName, @objectName, @indexName, GETDATE(), @fragPercent)

    SELECT @i = MIN(FragIndexId) 
    FROM #FragIndex
    WHERE FragIndexId > @i

    SELECT 
        @objectId = ObjectId, 
        @indexId = IndexId, 
        @fragPercent = FragPercent,
        @partitionNumber = PartitionNumber,
        @indexType = IndexType,
        @allocUnitType = AllocUnitType
    FROM #FragIndex
    WHERE FragIndexId = @i
END

GO

原帖在这里:

http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx

除了重建索引和整理碎片之外,您可以做的唯一其他事情就是消除或删除数据。如果你有int / bigints作为PK,这将允许你使用DBCC CHECKIDENT(tablename, value)重新设置你的PK。

您可以使用ALTER INDEX ALL ON MyTable REBUILD在桌面上重建索引。

答案 1 :(得分:3)

如果您的表都包含不相关的数据,您可以从每个表中选择X随机记录。我猜测问题是表格不是无关的,所以如果表A包含对表B的外键引用,你只需从表A中提取10%的记录,从表中提取10%的记录B,你有很多从A到B的无效引用。

我不知道这个问题的一般解决方案。这取决于数据库的确切结构。我经常发现我的数据库由少量“中心”表组成,这些表有很多来自其他表的引用。也就是说,我通常会发现我有一个订单表,然后是订单行表,指向订单,订单指向的客户表,以及指向订单或订单行的交货表,等等,但一切似乎都以“秩序”为中心。在这种情况下,您可以随机选择一些订单记录,然后找到这些订单的所有客户,这些订单的所有订单行等。我通常也有一些“代码查找”表,如列表所有“订单状态”代码,所有“客户类型”代码的另一个列表等。这些通常很小,所以我只是完全复制它们。

如果你的数据库更加......脱节......那么,如果它没有任何明确的中心,但却是一个相互关系的迷宫,这可能会复杂得多。不过,我认为同样的原则也适用。选择一些起点,从那里选择一些记录,然后获取与这些记录相关的所有记录等。

答案 2 :(得分:2)

我会编写数据库脚本,包括表,索引,触发器和存储过程。然后使用此脚本创建一个新的空数据库。现在,您可以根据集成测试的需要将数据添加到数据库。

您可以使用http://code.google.com/p/ndbunit/之类的工具来加载测试数据,这些数据是测试的一部分,并且一旦测试完成就会被删除。此外,我会在开发人员本地计算机上运行SQL Express中的测试,这样如果多个开发人员同时运行它们,测试就不会失败。

答案 3 :(得分:2)

检查http://jailer.sourceforge.net/。它是一种工具,可以从数据库中提取数据的子集,同时保持其一致的引用。我自己没有用过它,但我一直都是这样想的。

答案 4 :(得分:1)

在任何情况下,我都不允许开发人员针对较小的数据库进行开发,因为它必须在这个大小的数据库上运行。你会遇到只有在事情发生时才出现的问题,这是一个愚蠢的想法。在小型数据集上运行良好的查询不是在大型数据集上运行良好的查询。浪费时间编写无法在生产中运行的查询是允许开发人员使用一小组数据是愚蠢的一个原因。

答案 5 :(得分:1)

在我看来,“真实数据”的子集不应该用于单元测试。单元测试应独立于数据库的初始内容。他们应该创建特定测试所需的数据,执行测试,然后删除数据。或者,整个测试应该在最后回滚的事务中。

如果你不这样做,那么当有人决定删除或更改他们所依赖的数据时,你的测试就会失败,你会浪费大量时间试图找出你的测试突然失败的原因

对于QA或Integration系统,您应该能够根据您对表之间关系的了解来创建数据的子集。