仅通过SSIS导入Excel的最后一列

时间:2017-12-07 07:41:46

标签: sql-server excel ssis etl sql-server-data-tools

我有一个我每天收到的excel文件。该文件中的列数不是特定的。我的要求是通过SSIS加载我表中的最后一列。我如何能够动态识别上次使用的列?

3 个答案:

答案 0 :(得分:3)

您可以使用c#script:

确保添加Using System.Data.OleDb;到命名空间区域 并添加输出列LastCol并选择数据类型。

public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
        string fileName = @"C:\test.xlsx";
        string SheetName = "Sheet1";
        string cstr = "Provider.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=No;IMEX=1\"";

    OleDbConnection xlConn = new OleDbConnection(cstr);
    xlConn.Open();

    OleDbCommand xlCmd = xlConn.CreateCommand();
    xlCmd.CommandText = "Select * from [" + SheetName + "]";
    xlCmd.CommandType = CommandType.Text;
    OleDbDataReader rdr = xlCmd.ExecuteReader();

    int rowCt = 0; //Counter

    while (rdr.Read())
    {
        //skip headers
        if (rowCt != 0)
        {
            int maxCol = rdr.FieldCount;
            Output0Buffer.AddRow();
            Output0Buffer.LastCol = (int)rdr[maxCol];
        }
        rowCt++; //increment counter
    }
}

答案 1 :(得分:2)

解决方案概述

使用脚本任务:

  • 获取最后一列索引
  • 使用以下函数将索引转换为列字母(例如:1 - &gt; A)

    Sheet1
  • 构建只读最后一列的SQL命令

  • 选择此查询作为Excel来源

详细解决方案

这个答案假设工作表名称是VB.Net,使用的编程语言是@[User::strQuery]

  1. 首先创建一个类型为string 的SSIS变量(即@ [User :: strQuery])
  2. 添加另一个包含Excel文件路径的变量(即@ [User :: ExcelFilePath])
  3. 添加脚本任务,选择@[User::ExcelFilePath]作为ReadWrite变量,System.Data.OleDb作为ReadOnly Variable (在脚本任务窗口中)
  4. 将脚本语言设置为VB.Net,并在脚本编辑器窗口中编写以下脚本:
  5. 注意:您必须导入 m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString Dim strSheetname As String = String.Empty Dim intLastColumn As Integer = 0 m_strExcelConnectionString = Me.BuildConnectionString() Try Using OleDBCon As New OleDbConnection(m_strExcelConnectionString) If OleDBCon.State <> ConnectionState.Open Then OleDBCon.Open() End If 'Get all WorkSheets m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, New Object() {Nothing, Nothing, Nothing, "TABLE"}) 'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones For Each schRow As DataRow In m_dtschemaTable.Rows strSheetname = schRow("TABLE_NAME").ToString If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "]", OleDBCon) Dim dtTable As New DataTable("Table1") cmd.CommandType = CommandType.Text Using daGetDataFromSheet As New OleDbDataAdapter(cmd) daGetDataFromSheet.Fill(dtTable) End Using 'Get the last Column Index intLastColumn = dtTable.Columns.Count End Using 'when the first correct sheet is found there is no need to check others Exit For End If Next OleDBCon.Close() End Using Catch ex As Exception Throw New Exception(ex.Message, ex) End Try Dim strColumnname as String = GetExcelColumnName(intLastColumn) Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & strColumnname & ":" & strColumnname & "]" Dts.TaskResult = ScriptResults.Success End Sub Private Function GetExcelColumnName(columnNumber As Integer) As String Dim dividend As Integer = columnNumber Dim columnName As String = String.Empty Dim modulo As Integer While dividend > 0 modulo = (dividend - 1) Mod 26 columnName = Convert.ToChar(65 + modulo).ToString() & columnName dividend = CInt((dividend - modulo) / 26) End While Return columnName End Function

    Select * from [Sheet1$]
    1. 然后您必须添加一个Excel连接管理器,并选择要导入的Excel文件(只选择一个样本以便仅首次定义元数据)
    2. 将默认值@[User::strQuery]分配给变量@[User::strQuery]
    3. 在数据流任务中添加Excel源,从变量中选择SQL命令,然后选择Delay Validation
    4. 将DataFlow任务True属性设置为{{1}}
    5. 将其他组件添加到DataFlow任务
    6. 参考

答案 2 :(得分:0)

不,你不能这样做。列数和数据类型必须事先确定,不能更改。否则SSIS将失败。所以无法动态获取最后一列。解决方法是使用某个宏从excel中获取最后一列,然后将其用作SSIS的源。