如何计算表格中每一列的数据量

时间:2019-02-06 10:12:42

标签: sql sql-server

我在SQL Server中有一个具有490列的表,今天我需要添加更多表。我有一个API可以从外部系统填充此表,目前,由于该表中有〜550,000行,因此大约需要16个小时才能同步。我需要计算每列中正在使用的行数,以查看是否有可以删除的地方。

我已经研究了一段时间,并采取了最后的努力。我尝试了几种不同的方法,但是没有什么能满足我的需求。我知道我可以执行COUNT(column_name),但是有490列,这实际上是不可行的。

因此,我目前正在使用sys.columns表获取所述表中的行的列表,然后使用外部应用,其中使用来自表的COUNT(*)。这是一种工作方式,但显然只是每行再次向我返回表中的总行数。

我认为我需要将Count(*)替换为COUNT(sys.columns.name),但这也不起作用,它返回“ APPLY右侧的聚合无法引用左侧的列侧。”错误。

我认为目前最接近的代码如下,但距离我有一百万英里。

 SELECT

  name as 'Column',
  Counter.total   

 FROM sys.columns WITH (NOLOCK)

 OUTER APPLY
 (
    SELECT TOP 1
        COUNT(*) as total
    FROM lead WITH (nolock) 
 ) as Counter

 WHERE sys.columns.object_id = 544720993

这将返回以下内容-

Column    |     total
______________________

Column1   |       512345

Column2   |       512345

Column3   |       512345

Column4   |       512345

Column5   |       512345

但是,在理想的世界中,我想要以下内容

Column    |     total
______________________
Column1   |      512345 --(meaning no nulls in this column)

Column2   |      435765 --(mean some nulls in this column)

Column3   |      123423

Column4   |      76 --(meaning only 73 non nulls on this column)

Column5   |      0 --(meaning every row is null in this column)

谢谢您的时间!

2 个答案:

答案 0 :(得分:3)

样本数据

 CREATE TABLE [dbo].[Tp](
    [a] [char](2) NULL,
    [b] [char](2) NULL,
    [c] [char](2) NULL
    ) ON [PRIMARY]

GO    
INSERT INTO [Tp] ([a],[b],[c])VALUES('a','a','a')
INSERT INTO [Tp] ([a],[b],[c])VALUES('1','1','1')
INSERT INTO [Tp] ([a],[b],[c])VALUES('2','2','2')
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,'9',NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES('3','3','3')
INSERT INTO [Tp] ([a],[b],[c])VALUES('4','4','4')
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,NULL,NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,'7',NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,NULL,NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES('8','8','8')
INSERT INTO [Tp] ([a],[b],[c])VALUES('9','9','9')
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,NULL,NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES('','','')
INSERT INTO [Tp] ([a],[b],[c])VALUES('','','')
INSERT INTO [Tp] ([a],[b],[c])VALUES('','5','')
INSERT INTO [Tp] ([a],[b],[c])VALUES('2','','')
SELECT * FROM [Tp]

动态Sql脚本以获取预期结果

 DECLARE @ColumnCount nvarchar(max),
         @Sql nvarchar(max)

SELECT @Sql = STUFF((SELECT ' UNION ALL '+ ' '+'SELECT '''+TABLE_NAME+''' AS TABLE_NAME,'+''''+COLUMN_NAME+''''+' AS ColumName'+',SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) As Countof_nulls
      ,SUM(CASE WHEN ISNULL(NULLIF('+COLUMN_NAME+',''''),''1'')=''1'' THEN 1 ELSE 0 END) As CountOf_EmptySpace
      ,COUNT('+COLUMN_NAME+') As Count_not_nulls 
     FROM '+TABLE_NAME  
FROM 
INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='Tp' --Enter your table in the query
FOR XML PATH (''), TYPE).value('.', 'VARCHAR(MAX)'),1,10,'')

EXEC (@Sql)

结果

TABLE_NAME  ColumName   Countof_nulls   CountOf_EmptySpace  Count_not_nulls
***************************************************************************
    Tp          a           5                   9               11
    Tp          b           3                   7               13
    Tp          c           5                   10              11

答案 1 :(得分:1)

您可以使用带有动态SQL的游标,该游标会在临时表中插入每个COUNT检查。

您可以控制架构,表和列以使用光标的SELECT进行检查。

IF OBJECT_ID('tempdb..#ColumnResults') IS NOT NULL
    DROP TABLE #ColumnResults

CREATE TABLE #ColumnResults (
    SchemaName VARCHAR(100),
    TableName VARCHAR(100),
    ColumnName VARCHAR(100),
    TotalRows INT,
    NotNullAmount INT)


DECLARE @SchemaName VARCHAR(100)
DECLARE @TableName VARCHAR(100)
DECLARE @ColumnName VARCHAR(100)

DECLARE ColumnCursor CURSOR FOR
    SELECT
        QUOTENAME(T.TABLE_SCHEMA),
        QUOTENAME(T.TABLE_NAME),
        QUOTENAME(T.COLUMN_NAME)
    FROM
        INFORMATION_SCHEMA.COLUMNS AS T
    WHERE
        T.TABLE_NAME = 'YourTableName' AND      -- Filter here the table you want to check
        T.TABLE_SCHEMA = 'YourTableSchema'      -- Filter here the schema you want to check
    ORDER BY
        T.TABLE_SCHEMA,
        T.TABLE_NAME,
        T.COLUMN_NAME

OPEN ColumnCursor
FETCH NEXT FROM ColumnCursor INTO 
    @SchemaName, 
    @TableName,
    @ColumnName

WHILE @@FETCH_STATUS = 0
BEGIN

    DECLARE @DynamicSQL VARCHAR(MAX) = '
        INSERT INTO #ColumnResults (
            SchemaName,
            TableName,
            ColumnName,
            TotalRows,
            NotNullAmount)
        SELECT
            SchemaName = ''' + @SchemaName + ''',
            TableName = ''' + @TableName + ''',
            ColumnName = ''' + @ColumnName + ''',
            TotalRows = COUNT(1),
            NotNullAmount = COUNT(' + @ColumnName + ')
        FROM
            ' + @SchemaName + '.' + @TableName + ' AS T'

    -- PRINT (@DynamicSQL)
    EXEC (@DynamicSQL)

    FETCH NEXT FROM ColumnCursor INTO 
        @SchemaName, 
        @TableName,
        @ColumnName

END

CLOSE ColumnCursor
DEALLOCATE ColumnCursor


SELECT
    C.*
FROM
    #ColumnResults AS C
ORDER BY
    C.SchemaName,
    C.TableName,
    C.ColumnName

您可以注释EXEC并取消注释PRINT,以检查执行前创建的动态SQL。

请注意,这实际上将为每个列执行一个SELECT,而不是为表中的所有列执行SELECT。您可以对动态SQL进行一些改动,以便在检查所有列时每个表可以使用一次,但是我发现这种方法更加整洁,并且能够以相同的方式跨模式和表进行工作。