使用Row_Number在表中插入select的结果

时间:2017-11-10 16:15:35

标签: sql sql-server

我正在创建一个查询,其中我在表中选择数据,然后从该表中选择多个行,然后将这些行插入另一个数据库中的另一个相同表中,然后重复该过程以选择下一个数字来自orignal表的行。

For Reference,这是我尝试做的(已经为Oracle构建):

$" INSERT INTO {destination-table} 
     SELECT * FROM {original-table}
     WHERE ROWID IN (SELECT B.RID 
          FROM  (SELECT ROWID AS RID, rownum as RID2 
                   FROM {original-table}
                   WHERE {Where Claus} 
                   AND ROWNUM <= {recordsPerStatement * iteration}                                              
) B WHERE RID2 > {recordsPerStatement * (iteration - 1)})"

这是通过.net

中的循环进行的

对于SQL Server但是我没有完成这项工作。我检索的数据:

$" Select B.* from (Select A.* from (Select Row_NUMBER() 
       OVER (order by %%physloc%%) As RowID, {original-table}.* FROM 
{original-table} where {where-claus}) 
A Where A.RowID between {recordsPerStatement * (iteration - 1)} 
AND {recordsPerStatement * iteration} B"

这里的问题是上面的select产生了一个额外的列(ROWID),阻止我将上述数据插入到目标表中

我一直在寻找摆脱顶部选择中的ROWID列的方法,或者根据检索到的数据从原始表中插入数据 (类似插入到目标表中的select * from original-table存在于(其余的select查询中).....但无济于事

TLDR =删除计算中使用的ROWID列,然后才能将行插入到相同的表中

说明:

  • 很多(数百万行)数据(因此以比特处理)
  • 未知的表(因此我无法调用特定的列名,因为它们未知)
  • 需要有一个订单(因此是row_number),因此相同的数据不会被复制两次。
  • 使用选择查询进行插入(首先检索它并在本地执行一些魔法会严重影响性能)
  • 如果需要,可以在这里添加其他变量(比如订单claus变量)但是,对查询中的数据的任何引用都将始终是变量+如果我能找到一种方法在查询中不添加更多varriables那么会更好

我希望有人能够了解我可以进一步了解的内容。

2 个答案:

答案 0 :(得分:1)

此方法使用临时表来保存分页数据,然后逐页处理。它对我有用,但不确定你是否可能遇到非常大的数据集问题。您可以将整个内容放入SP,然后使用.net中的参数调用SP。您需要为目标表名添加参数,并在最终循环中构造/执行INSERT语句。

        -- Parameters
        DECLARE @PageSize integer = 100;
        DECLARE @TableName nVarchar(200) = 'WRD_WordHits';
        DECLARE @OrderBy nVarchar(3000) = 'WordID'

        STEP_010: BEGIN   
        -- Get the column definitions for the table

            DECLARE @Cols int;
            SELECT TABLE_NAME, ORDINAL_POSITION, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
                   , IS_NULLABLE
            INTO #Tspec
            FROM INFORMATION_SCHEMA.COLUMNS
            WHERE TABLE_NAME = @TableName;

            -- Number of columns
            SET @Cols = @@ROWCOUNT;
        END;

        STEP_020: BEGIN  
            -- Create the temporary table that will hold the paginated data
            CREATE TABLE #TT2 ( PageNumber int, LineNumber int, SSEQ int )
            DECLARE @STMT nvarchar(3000);


        END;

        STEP_030: BEGIN
        -- Add columns to #TT2 using the column definitions
            DECLARE @Ord int = 0;
            DECLARE @Colspec nvarchar(3000) = '';
            DECLARE @AllCols nvarchar(3000) = '';
            DECLARE @ColName nvarchar(200) = '';


            WHILE @Ord < @Cols BEGIN

                        SELECT @Ord = @Ord + 1;

                        -- Get the column name and specification
                        SELECT @ColName = Column_Name
                                , @Colspec = 
                                     Column_Name + ' ' + DATA_TYPE + CASE WHEN CHARACTER_MAXIMUM_LENGTH IS NULL THEN ''
                                         ELSE '(' + CAST(CHARACTER_MAXIMUM_LENGTH AS varchar(30) ) + ')' END

                            FROM #Tspec WHERE ORDINAL_POSITION = @Ord;

                         -- Create and execute statement to add the column and the columns list used later
                         SELECT @STMT = ' ALTER TABLE #TT2 ADD ' + @Colspec + ';' 
                                , @AllCols = @AllCols + ', ' + @ColName ;
                         EXEC sp_ExecuteSQL @STMT;

            END;

            -- Remove leading comma from columns list
            SELECT @AllCols = SUBSTRING(@AllCols, 3, 3000);
            PRINT @AllCols

            -- Finished with the source table spec
            DROP TABLE #Tspec;

        END;

        STEP_040: BEGIN -- Create and execute the statement used to fill #TT2 with the paginated data from the source table

            -- The first two cols are the page number and row number within the page
            -- The sequence is arbitrary but could use a key list for the order by clause

            SELECT @STMT = 
                         'INSERT #TT2
                        SELECT FLOOR( CAST( SSEQ as float) /' + CAST(@PageSize as nvarchar(10)) + ' ) + 1 PageNumber, (SSEQ) % ' + CAST(@PageSize as nvarchar(10)) + ' + 1 LineNumber, * FROM
                        ( 

                                SELECT ROW_NUMBER() OVER ( ORDER BY ' + @OrderBy + '  ) - 1 AS SSEQ, * FROM ' + @TableName + ' 

                        )
                            A; ' ;
            EXEC sp_ExecuteSQL @STMT;

            -- *** Test only to show that the table contains the data
                    --SELECT * FROM #TT2;
                    --SELECT @STMT = 'SELECT NULL AS EXECSELECT, ' + @AllCols + ' FROM #TT2;' ;
                    --EXEC sp_ExecuteSQL @STMT;
            -- *** 
        END;

        STEP_050: BEGIN -- Loop through paginated data, one page at a time.
            -- Variables to control the paginated loop
            DECLARE @PageMAX int;
            SELECT @PageMAX = MAX(PageNumber) FROM #TT2;
            PRINT 'Generated ' + CAST( @PageMAX AS varchar(10) ) + ' pages from table';

            DECLARE @Page int = 0;

            WHILE @Page < @PageMax BEGIN
                SELECT @Page = @Page + 1;

                -- Create and execute the statement to get one page of data - this could be any statement to process data page by page
                SELECT @STMT = 'SELECT ' +  @AllCols  + ' FROM #TT2 WHERE PageNumber = ' + CAST(@Page AS Varchar(10 )) + ' ORDER BY LineNumber '

                -- Execute the statment.
                PRINT @STMT -- For testing
                --EXEC sp_EXECUTESQL @STMT;

            END;

            -- Finished with Paginated data
            DROP TABLE #TT2;

        END;

答案 1 :(得分:0)

我提出的解决方案:

首先从数据库中读取column_names并在本地存储它们,然后在构建insert / select查询时再次使用它们,并仅从视图中选择那些列(它们都与ROWID分开)。

commandText = $"SELECT column_name
               FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = N'{table}'"

columnNames = "executionfunction with commandText"
columnNamesCount = columnNames.Rows.Count

Dim counter As Int16 = 0
commandText = String.Empty
commandText = $"INSERT INTO {destination} SELECT "

    For Each row As DataRow In columnNames.Rows

         If counter = columnNamesCount - 1 Then
              commandText += $"B.{row("column_name")} "
         Else
              commandText += $"B.{row("column_name")}, "
         End If

         counter = counter + 1
    Next

    commandText += $"FROM 
                   (Select A.* FROM (Select Row_NUMBER() 
                   OVER(order by %%physloc%%) AS RowID, {table}.* 
                   FROM {table} where {filter}) A
                   WHERE A.RowID between ({recordsPerStatement} * ({iteration}-1)) + 1   
                   AND ({recordsPerStatement} * {iteration})) B"

编辑:删除%% physloc %%子句AN OFFSET FETCH NEXT部分已经内置。新方法:

commandText += $"INSERT INTO {destination} SELECT * FROM {table} WHERE {filter}"

    For i As Int16 = 1 To columnNamesCount

        If i = 1 Then
            commandText += $"ORDER BY {columnNames.Rows(i - 1)("column_name")} ASC"
        Else
            commandText += $"{columnNames.Rows(i - 1)("column_name")} ASC"
        End If

        If i <> columnNamesCount Then
            commandText += ", "
        End If
    Next

commandText += $" OFFSET ({recordsPerStatement} * ({iteration} -1)) ROWS FETCH Next {recordsPerStatement} ROWS ONLY"