使用SSIS

时间:2018-05-16 18:58:53

标签: c# sql-server csv ssis etl

我的任务是加载csv文件中出现的会计事务。该文件包含应用于整个文件的一行标题信息,但由于某种原因,它按照交易数据上方的帐号将数据分组,但与ID位于同一列。

"ID","Name","Date","Debit","Credit","Balance"
,,,,,
"1150 - Cash in Bank",,,,,
"Starting Balance",,,,,"59,612.78"
615892,"Account Name 1","5/5/2018","2,100.00",,"61,712.78"
645761,"Account Name 2","5/7/2018",,7,"61,705.78"
615892,"Account Name 3","5/8/2018",,"2,144.33","59,561.45"
713300,"Account Name 4","5/8/2018","2,144.33",,"61,705.78"
713300,"Account Name 5","5/8/2018",,"2,144.33","59,561.45"
693615,"Account Name 6","5/9/2018",,"1,650.00","57,911.45"
"Net Change",,,,,"-1,701.33"
,,,"4,244.33","5,945.66","57,911.45"
"3150 - Owner Contribution",,,,,
"Starting Balance",,,,,0
713300,"Account Name 4","5/8/2018",,"2,144.33","-2,144.33"
"Net Change",,,,,"-2,144.33"
,,,0,"2,144.33","-2,144.33"

有人能让我开始研究如何处理这个问题吗?我看到如何通过一些变量和逐行处理在逻辑上实现这一点,但我根本不是C#或前端开发人员。我最大的问题似乎是你不能写一篇文章并像SQL那样测试它。我可以查询表,查看数据并继续构建它,但是使用C#我需要一个完整的脚本才能一起工作。如何从小块开始扩展?比如甚至在第一个帐户名中读取变量并将其显示为数据流任务中的变量。我可以发送代码并获得回复的东西,似乎我在网上找到的每个脚本都有一些编译错误,而且我还不太了解它们。

3 个答案:

答案 0 :(得分:1)

解决方案概述

我在VB.Net中提供了我的答案,因为它可能更容易理解,特别是你不是C#开发人员

  • Dataflow task之后Script Component
  • 添加Flat File Source
  • 将所有列标记为“输入列”和“添加8输出列”
  • Input0_ProcessInputRow检查ID列是否为空并且包含用于创建输出行的整数,否则如果它包含帐号或起始余额将这些值存储到变量中,则忽略该行。 / LI>

详细解决方案

  1. 添加平面文件连接管理器,选择文本文件
  2. 将文字限定符更改为"
  3. enter image description here

    1. 添加DataFlow任务
    2. 在数据流任务内添加平面文件源,脚本组件和OLEDB目标
    3. enter image description here

      1. 在脚本部件中选择所有列作为输入列

      2. 添加8个输出列(主列+帐户+起始余额)(所有类型DT_STR

      3. enter image description here

        1. OutputBuffer SynchronousInput媒体资源更改为None
        2. enter image description here

          1. 选择脚本语言为Visual Basic
          2. enter image description here

            1. 在脚本编辑器中编写以下脚本

              Private AccountName as String = ""
              Private StartingBalance as String = ""
              
            2.     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
              
                      If Not Row.ID_IsNull AndAlso
                              Not String.IsNullOrEmpty(Row.ID.Trim) Then
              
              
                          'Skip Bad Rows
                          If Row.ID = "" Then Exit Sub
              
                          If Integer.TryParse(Row.ID,New Integer) Then
              
                              Output0Buffer.AddRow()
                              Output0Buffer.ID = Row.ID
                              Output0Buffer.Name = Row.Name
                              Output0Buffer.Date = Row.Date
                              Output0Buffer.Debit = Row.Debit
                              Output0Buffer.Credit = Row.Credit
                              Output0Buffer.Balance = Row.Balance
                              Output0Buffer.Account = AccountName
                              Output0Buffer.StartingBalance = StartingBalance
              
                          Elseif Row.ID.Contains("Starting Balance") Then
              
                              StartingBalance = Row.Balance
              
                          Elseif Row.ID.Contains("-") Then
              
                              AccountName = Row.ID
              
                          Else 
              
              
                              'Ignore Row 
                              Exit Sub
              
                          End If
              
              
              
              
              
              
                      End If
              
                  End Sub
              
              1. 将输出列映射到目标列
              2. 输出结果为:
              3. enter image description here

答案 1 :(得分:0)

这应该将所有内容都集成到DataTable结构中,然后您可以使用它来分配或执行任何操作。如果你需要一种不同类型的终端物品,请告诉我。

        var data = string.Empty; //String var to hold file
        var tbl = new DataTable("MyData"); //Tmp dataTable object
        using (var fs = new StreamReader(@"C:\Temp\test.csv")) //Open file
            data = fs.ReadToEnd(); //Read entirely into data variable

        var rows = data.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries); //Split into array by lines. RemoveEmpty's for end of file extra lines.

        var cnt = 0; //Counter to know header
        foreach (var row in rows) //Iterate rows
        {
            var cells = row.Split(new string[] { "\",\"" }, StringSplitOptions.None); //Split row into cells. Leave empties here cause some cells might be empty.
            if (cnt == 0) foreach (var cell in cells) //If is header row add columns
                    tbl.Columns.Add(new DataColumn(cell));
            else //Else data row
            {
                var dataRow = tbl.NewRow(); //New row
                dataRow.ItemArray = cells; //Assign cell values
                tbl.Rows.Add(dataRow); //Add row to table.
            }
            cnt++;
        }

编辑:清理使用并添加评论。

EDIT2:如果文件很大,那么这是一个流媒体版本:

        var cnt = 0; //Row counter
        var tbl = new DataTable("MyData"); //Tmp dataTable object
        using (var fs = new StreamReader(@"C:\Temp\test.csv")) //Load file
        {
            do //Start loop
            {
                var row = fs.ReadLine(); //Get first line
                var cells = row.Split(new string[] { "\",\"" }, StringSplitOptions.None); //Split into cells
                if (cnt == 0) //If is header row
                {
                    foreach (var cell in cells) //For each header
                        tbl.Columns.Add(new DataColumn(cell)); //Add Column
                } else { //Not header row
                    var dataRow = tbl.NewRow(); //Create new row based on tmp table
                    dataRow.ItemArray = cells; //Assign cell values
                    tbl.Rows.Add(row); //Add row to table
                }
                cnt++;
            } while (!fs.EndOfStream); //If not done loop
        }

答案 2 :(得分:0)

我刚看到这篇文章。仅在1天前完成了非常类似的体验,我建议您在下面运行宏(它可以在Excel或CSV中运行,但如果您使用CSV扩展名保存更改,则无法保存代码)。

' Add reference to Microsoft Active X Data Objects 2.8 Library                                                           

Sub testexportsql()
    Dim Cn As ADODB.Connection
    Dim ServerName As String
    Dim DatabaseName As String
    Dim TableName As String
    Dim UserID As String
    Dim Password As String
    Dim rs As ADODB.Recordset
    Dim RowCounter As Long

    Dim NoOfFields As Integer
    Dim StartRow As Long
    Dim EndRow As Long

    Dim ColCounter As Integer


    Set rs = New ADODB.Recordset


    ServerName = "server_name" ' Enter your server name here
    DatabaseName = "db_name" ' Enter your  database name here
    TableName = "table_name" ' Enter your Table name here
    UserID = "" ' Enter your user ID here
     ' (Leave ID and Password blank if using windows Authentification")
    Password = "" ' Enter your password here
    NoOfFields = 10 ' Enter number of fields to update (eg. columns in your worksheet)
    StartRow = 2 ' Enter row in sheet to start reading  records
    EndRow = 100 ' Enter row of last record in sheet

     '  CHANGES
    Dim shtSheetToWork As Worksheet
    Set shtSheetToWork = ActiveWorkbook.Worksheets("sheet_name")
     '********

    Set Cn = New ADODB.Connection
    Cn.Open "Driver={SQL Server};Server=" & ServerName & ";Database=" & DatabaseName & _
    ";Uid=" & UserID & ";Pwd=" & Password & ";"

    rs.Open TableName, Cn, adOpenKeyset, adLockOptimistic

     'EndRow = shtSheetToWork.Cells(Rows.Count, 1).End(xlUp).Row
    For RowCounter = StartRow To EndRow
        rs.AddNew
        For ColCounter = 1 To NoOfFields
            rs(ColCounter - 1) = shtSheetToWork.Cells(RowCounter, ColCounter)
        Next ColCounter
        Debug.Print RowCounter
    Next RowCounter
    rs.UpdateBatch

     ' Tidy up
    rs.Close
    Set rs = Nothing
    Cn.Close
    Set Cn = Nothing

End Sub

希望此解决方案适合您。这绝对对我有用。