导入CSV文件错误:包含列分隔符的列值

时间:2016-11-03 22:21:37

标签: sql sql-server csv ssis delimiter

我正在尝试使用SSIS将Csv文件导入SQL SERVER

以下是数据如何显示的示例

Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name
Joseph Jade,2005-01-01,1,Good listener,Male,Amy
Amy Jade,2006-01-01,1,Good in science,Female,Amy
....

Csv列不包含文本限定符(引用)

我使用SSIS创建了一个简单的包,将其导入SQL,但有时SQL中的数据如下所示

Student_Name    Student_DOB Student_ID  Student_Notes   Student_Gender  Student_Mother_Name
Ali Jade    2004-01-01  1   Good listener   Bad in science  Male,Lisa

原因是somtimes [Student_Notes]列包含用作列分隔符的逗号(,),因此行未正确导入

任何建议

3 个答案:

答案 0 :(得分:1)

警告:我不是普通的C#编码员。

但无论如何,此代码执行以下操作:

它打开一个名为C:\ Input.TXT

的文件

搜索每一行。如果该行有超过5个逗号,则它会从第三个字段中删除所有额外的逗号(注释)

它将结果写入C:\ Output.TXT - 这是您实际导入的那个

可以做出许多改进:

  • 从连接管理器获取文件路径
  • 错误处理
  • 经验丰富的C#程序员可能会在hlaf代码中执行此操作

请记住,您的包需要对相应文件夹的写访问权

public void Main()
{
    // Search the file and remove extra commas from the third last field
    // Extended from code at
    // http://stackoverflow.com/questions/1915632/open-a-file-and-replace-strings-in-c-sharp
    // Nick McDermaid        

    string sInputLine;
    string sOutputLine;
    string sDelimiter = ",";
    String[] sData;
    int iIndex;

    // open the file for read
    using (System.IO.FileStream inputStream = File.OpenRead("C:\\Input.txt"))
    {
        using (StreamReader inputReader = new StreamReader(inputStream))
        {
            // open the output file
            using (StreamWriter outputWriter = File.AppendText("C:\\Output.txt"))
            {
                // Read each line
                while (null != (sInputLine = inputReader.ReadLine()))
                {
                    // Grab each field out
                    sData = sInputLine.Split(sDelimiter[0]);
                    if (sData.Length <= 6)
                    {
                        // 6 or less fields - just echo it out
                        sOutputLine = sInputLine;
                    }
                    else
                    {
                        // line has more than 6 pieces 
                        // We assume all of the extra commas are in the notes field                                

                        // Put the first three fields together
                        sOutputLine =
                            sData[0] + sDelimiter +
                            sData[1] + sDelimiter +
                            sData[2] + sDelimiter;

                        // Put the middle notes fields together, excluding the delimiter
                        for (iIndex=3; iIndex <= sData.Length - 3; iIndex++)
                        {
                            sOutputLine = sOutputLine + sData[iIndex] + " ";
                        }

                        // Tack on the last two fields
                        sOutputLine = sOutputLine +
                            sDelimiter + sData[sData.Length - 2] +
                            sDelimiter + sData[sData.Length - 1];


                    }

                    // We've evaulted the correct line now write it out
                    outputWriter.WriteLine(sOutputLine);
                }
            }
        }
    }


    Dts.TaskResult = (int)Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success;
}

答案 1 :(得分:1)

在平面文件连接管理器中。将文件设为仅一列(DT_STR 8000)

只需在dataflowtask中添加一个脚本Component并添加输出列(与所示示例相同)

脚本组件中的

使用以下代码拆分每一行:

\\Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name

Dim strCells() as string = Row.Column0.Split(CChar(","))

Row.StudentName = strCells(0)
Row.StudentDOB = strCells(1)
Row.StudentID = strCells(2)
Row.StudentMother = strCells(strCells.Length - 1)
Row.StudentGender = strCells(strCells.Length - 2)

Dim strNotes as String = String.Empty

For int I = 3 To strCells.Length - 3

strNotes &= strCells(I)

Next

Row.StudentNotes = strNotes

它对我来说很好

答案 2 :(得分:0)

如果导入CSV文件不是例行程序

  1. 在Excel中导入CSV文件
  2. 使用Excel行搜索错误行过滤并重写
  3. 以ExcelT制表符分隔
  4. 保存Excel文件
  5. 使用SSIS导入TXT文件 否则,创建一个在学生笔记栏范围内搜索逗号的脚本