在使用biml时,如何让合并连接的输入组件在SSIS中进行排序?

时间:2014-10-06 19:17:53

标签: ssis biml

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
    <Package Name="Extraction_RecordCount" ConstraintMode="Parallel">
        <Tasks>
            <ExecuteSQL Name="Extraction_RecordCount" ConnectionName="Target">
                <DirectInput> <![CDATA[ Truncate table CMC.Extraction_RecordCount ]]> </DirectInput>
            </ExecuteSQL>
            <Dataflow Name="Fill Extraction_RecordCount">
                <PrecedenceConstraints>
                    <Inputs>
                        <Input OutputPathName="Extraction_RecordCount.Output" />
                    </Inputs>
                </PrecedenceConstraints>
                <Transformations>
                    <OleDbSource Name="ExtractedTables" ConnectionName="Target" >
                        <DirectInput>
                            <![CDATA[
                            SELECT cast( sysobjects.NAME as nvarchar(128)) as TableName 
                                ,sysindexes.Rows as #Rows 
                            FROM sysobjects 
                            INNER JOIN sysindexes ON sysobjects.id = sysindexes.id 
                            INNER JOIN ( SELECT c.table_name ,c.table_schema FROM information_schema.columns c GROUP BY c.table_name ,c.table_schema) c ON c.table_name = sysobjects.NAME 
                            WHERE type = 'U' 
                                AND sysindexes.IndId < 2 
                                AND c.table_schema = 'EXT' 
                            ORDER BY TableName, #Rows
                            ]]>
                        </DirectInput>
                    </OleDbSource>
                    <OleDbSource Name="BackOffice" ConnectionName="Source" >
                        <DirectInput> <![CDATA[ select TABLE_NAME  , cast(NUM_ROWS as int) as NUM_ROWS from ALL_ALL_TABLES ORDER BY TABLE_NAME, NUM_ROWS]]> </DirectInput>
                    </OleDbSource>
                    <MergeJoin Name="Join Extracted Tables w BACKOFFICE" JoinType="InnerJoin">
                        <LeftInputPath OutputPathName="ExtractedTables.Output">
                            <Columns>
                                <Column SourceColumn="TableName" SortKeyPosition="1"/>
                                <Column SourceColumn="#Rows" SortKeyPosition="2"/>
                            </Columns>
                        </LeftInputPath>
                        <RightInputPath OutputPathName="BackOffice.Output">
                            <Columns>
                                <Column SourceColumn="TABLE_NAME" SortKeyPosition="1"/>
                                <Column SourceColumn="NUM_ROWS" SortKeyPosition="2" />
                            </Columns>
                        </RightInputPath>
                        <JoinKeys>
                            <JoinKey LeftColumn="TableName" RightColumn="TABLE_NAME" />
                        </JoinKeys>
                    </MergeJoin>
                    <OleDbDestination Name="Extraction_RecordCount" ConnectionName="Target">
                        <ExternalTableOutput Table="CMC.Extraction_RecordCount"/>
                    </OleDbDestination>
                </Transformations>
            </Dataflow>
        </Tasks>
    </Package>
</Packages>

此代码确实会生成包'Extraction_RecordCount',但'Merge Join'组件会抛出错误,指出必须对两个源的输入进行排序。手动设置'IsSorted'='True'并设置'SortKeyPosition'可以暂时解决问题。

插入排序组件也不起作用。

1 个答案:

答案 0 :(得分:0)

合并加入的要求是对您的来源进行排序。您当前的代码指定的是合并连接转换的输出已排序。相反,您希望指示合并连接的输入已排序。

您的源数据已排序,我看其中都有明确的ORDER BY操作。您缺少的是源组件已排序的规范。

                <OleDbSource Name="ExtractedTables" ConnectionName="Target" >
                    <DirectInput>
                        <![CDATA[
                        SELECT cast( sysobjects.NAME as nvarchar(128)) as TableName 
                            ,sysindexes.Rows as #Rows 
                        FROM sysobjects 
                        INNER JOIN sysindexes ON sysobjects.id = sysindexes.id 
                        INNER JOIN ( SELECT c.table_name ,c.table_schema FROM information_schema.columns c GROUP BY c.table_name ,c.table_schema) c ON c.table_name = sysobjects.NAME 
                        WHERE type = 'U' 
                            AND sysindexes.IndId < 2 
                            AND c.table_schema = 'EXT' 
                        ORDER BY TableName, #Rows
                        ]]>
                    </DirectInput>
                     <Columns>
                         <Column SourceColumn="TableName" SortKeyPosition="1"></Column>
                         <Column SourceColumn="#Rows" SortKeyPosition="2"></Column>
                     </Columns>
                </OleDbSource>
                <OleDbSource Name="BackOffice" ConnectionName="Source" >
                    <DirectInput> <![CDATA[ select TABLE_NAME  , cast(NUM_ROWS as int) as NUM_ROWS from ALL_ALL_TABLES ORDER BY TABLE_NAME, NUM_ROWS]]> </DirectInput>
                        <Columns>
                            <Column SourceColumn="TABLE_NAME" SortKeyPosition="1"></Column>
                            <Column SourceColumn="NUM_ROWS" SortKeyPosition="2"></Column>
                        </Columns>
                </OleDbSource>

我不是100%第一个查询中#Rows的实际名称有效,但重要的是将其标记为按列名排序

我对这个DBA.StackExchange.com问题的答案有一个完整的端到端Merge Join示例