如何拆分列可能包含的csv,

时间:2011-07-01 02:22:47

标签: c# .net csv

鉴于

  
    

2,1016,7 / 31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,“Corvallis,OR”,7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34 < / p>   

如何使用C#将上述信息拆分为字符串,如下所示:

2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34

如您所见,其中一列包含&lt; =(Corvallis,OR)

//更新// 基于 C# Regex Split - commas outside quotes

string[] result = Regex.Split(samplestring, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

8 个答案:

答案 0 :(得分:154)

使用Microsoft.VisualBasic.FileIO.TextFieldParser课程。这将处理解析分隔文件TextReaderStream,其中某些字段用引号括起来,有些则不包含在引号中。

例如:

using Microsoft.VisualBasic.FileIO;

string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";

TextFieldParser parser = new TextFieldParser(new StringReader(csv));

// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");

parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");

string[] fields;

while (!parser.EndOfData)
{
    fields = parser.ReadFields();
    foreach (string field in fields)
    {
        Console.WriteLine(field);
    }
} 

parser.Close();

这应该导致以下输出:

2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34

有关详细信息,请参阅Microsoft.VisualBasic.FileIO.TextFieldParser

您需要在“添加引用.NET”选项卡中添加对Microsoft.VisualBasic的引用。

答案 1 :(得分:17)

这太晚了,但这对某人有帮助。我们可以使用RegEx作为下面的内容。

Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = CSVParser.Split(Test);

答案 2 :(得分:5)

您可以拆分所有逗号后面都有偶数引号的逗号。

您还希望在specf处查看有关处理逗号的CSV格式。

有用的链接:C# Regex Split - commas outside quotes

答案 3 :(得分:4)

我看到如果您在Excel中粘贴csv分隔文本并执行“Text to Columns”,它会要求您提供“文本限定符”。它默认为双引号,因此它将双引号内的文本视为字面值。我想Excel通过一次输入一个字符来实现这一点,如果它遇到“文本限定符”,它会继续前进到下一个“限定符”。您可以使用for循环和布尔值来实现这一点,以表示您是否在文本文本中。

public string[] CsvParser(string csvText)
{
    List<string> tokens = new List<string>();

    int last = -1;
    int current = 0;
    bool inText = false;

    while(current < csvText.Length)
    {
        switch(csvText[current])
        {
            case '"':
                inText = !inText; break;
            case ',':
                if (!inText) 
                {
                    tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ',')); 
                    last = current;
                }
                break;
            default:
                break;
        }
        current++;
    }

    if (last != csvText.Length - 1) 
    {
        tokens.Add(csvText.Substring(last+1).Trim());
    }

    return tokens.ToArray();
}

答案 4 :(得分:3)

使用LumenWorks等库来进行CSV阅读。它会处理带引号的字段,并且由于已经存在了很长时间,它们可能总体上比您的自定义解决方案更强大。

答案 5 :(得分:1)

我的CSV存在问题,其中包含带引号字符的字段,因此使用TextFieldParser,我想出了以下内容:

private static string[] parseCSVLine(string csvLine)
{
  using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
  {
    TFP.HasFieldsEnclosedInQuotes = true;
    TFP.SetDelimiters(",");

    try 
    {           
      return TFP.ReadFields();
    }
    catch (MalformedLineException)
    {
      StringBuilder m_sbLine = new StringBuilder();

      for (int i = 0; i < TFP.ErrorLine.Length; i++)
      {
        if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
          m_sbLine.Append("\"\"");
        else
          m_sbLine.Append(TFP.ErrorLine[i]);
      }

      return parseCSVLine(m_sbLine.ToString());
    }
  }
}

StreamReader仍然用于逐行读取CSV,如下所示:

using(StreamReader SR = new StreamReader(FileName))
{
  while (SR.Peek() >-1)
    myStringArray = parseCSVLine(SR.ReadLine());
}

答案 6 :(得分:1)

使用Cinchoo ETL-一个开放源代码库,它可以自动处理包含分隔符的列值。

string csv = @"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";

using (var p = ChoCSVReader.LoadText(csv)
    )
{
    Console.WriteLine(p.Dump());
}

输出:

Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34

有关更多信息,请访问codeproject文章。

希望有帮助。

答案 7 :(得分:0)

这个问题及其重复问题有很多答案。我尝试了 this one that looked promising,但发现了一些错误。我对其进行了大量修改,以便通过我的所有测试。

    /// <summary>
    /// Returns a collection of strings that are derived by splitting the given source string at
    /// characters given by the 'delimiter' parameter.  However, a substring may be enclosed between
    /// pairs of the 'qualifier' character so that instances of the delimiter can be taken as literal
    /// parts of the substring.  The method was originally developed to split comma-separated text
    /// where quotes could be used to qualify text that contains commas that are to be taken as literal
    /// parts of the substring.  For example, the following source:
    ///     A, B, "C, D", E, "F, G"
    /// would be split into 5 substrings:
    ///     A
    ///     B
    ///     C, D
    ///     E
    ///     F, G
    /// When enclosed inside of qualifiers, the literal for the qualifier character may be represented
    /// by two consecutive qualifiers.  The two consecutive qualifiers are distinguished from a closing
    /// qualifier character.  For example, the following source:
    ///     A, "B, ""C"""
    /// would be split into 2 substrings:
    ///     A
    ///     B, "C"
    /// </summary>
    /// <remarks>Originally based on: https://stackoverflow.com/a/43284485/2998072</remarks>
    /// <param name="source">The string that is to be split</param>
    /// <param name="delimiter">The character that separates the substrings</param>
    /// <param name="qualifier">The character that is used (in pairs) to enclose a substring</param>
    /// <param name="toTrim">If true, then whitespace is removed from the beginning and end of each
    /// substring.  If false, then whitespace is preserved at the beginning and end of each substring.
    /// </param>
    public static List<String> SplitQualified(this String source, Char delimiter, Char qualifier,
                                Boolean toTrim)
    {
        // Avoid throwing exception if the source is null
        if (String.IsNullOrEmpty(source))
            return new List<String> { "" };

        var results = new List<String>();
        var result = new StringBuilder();
        Boolean inQualifier = false;

        // The algorithm is designed to expect a delimiter at the end of each substring, but the
        // expectation of the caller is that the final substring is not terminated by delimiter.
        // Therefore, we add an artificial delimiter at the end before looping through the source string.
        String sourceX = source + delimiter;

        // Loop through each character of the source
        for (var idx = 0; idx < sourceX.Length; idx++)
        {
            // If current character is a delimiter
            // (except if we're inside of qualifiers, we ignore the delimiter)
            if (sourceX[idx] == delimiter && inQualifier == false)
            {
                // Terminate the current substring by adding it to the collection
                // (trim if specified by the method parameter)
                results.Add(toTrim ? result.ToString().Trim() : result.ToString());
                result.Clear();
            }
            // If current character is a qualifier
            else if (sourceX[idx] == qualifier)
            {
                // ...and we're already inside of qualifier
                if (inQualifier)
                {
                    // check for double-qualifiers, which is escape code for a single
                    // literal qualifier character.
                    if (idx + 1 < sourceX.Length && sourceX[idx + 1] == qualifier)
                    {
                        idx++;
                        result.Append(sourceX[idx]);
                        continue;
                    }
                    // Since we found only a single qualifier, that means that we've
                    // found the end of the enclosing qualifiers.
                    inQualifier = false;
                    continue;
                }
                else
                    // ...we found an opening qualifier
                    inQualifier = true;
            }
            // If current character is neither qualifier nor delimiter
            else
                result.Append(sourceX[idx]);
        }

        return results;
    }

以下是证明其有效的测试方法:

    [TestMethod()]
    public void SplitQualified_00()
    {
        // Example with no substrings
        String s = "";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_00A()
    {
        // just a single delimiter
        String s = ",";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "", "" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_01()
    {
        // Example with no whitespace or qualifiers
        String s = "1,2,3,1,2,3";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_02()
    {
        // Example with whitespace and no qualifiers
        String s = " 1, 2 ,3,  1  ,2\t,   3   ";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_03()
    {
        // Example with whitespace and no qualifiers
        String s = " 1, 2 ,3,  1  ,2\t,   3   ";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(
            new List<String> { " 1", " 2 ", "3", "  1  ", "2\t", "   3   " },
            substrings);
    }
    [TestMethod()]
    public void SplitQualified_04()
    {
        // Example with no whitespace and trivial qualifiers.
        String s = "1,\"2\",3,1,2,\"3\"";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);

        s = "\"1\",\"2\",3,1,\"2\",3";
        substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_05()
    {
        // Example with no whitespace and qualifiers that enclose delimiters
        String s = "1,\"2,2a\",3,1,2,\"3,3a\"";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2,2a", "3", "1", "2", "3,3a" },
                                substrings);

        s = "\"1,1a\",\"2,2b\",3,1,\"2,2c\",3";
        substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1,1a", "2,2b", "3", "1", "2,2c", "3" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_06()
    {
        // Example with qualifiers enclosing whitespace but no delimiter
        String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_07()
    {
        // Example with qualifiers enclosing whitespace but no delimiter
        String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", "2 ", "3", "1", "2", "\t3\t" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_08()
    {
        // Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
        String s = "\" 1 \", \"2 \"  ,  3,1, 2 ,\"  3  \"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_09()
    {
        // Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
        String s = "\" 1 \", \"2 \"  ,  3,1, 2 ,\"  3  \"";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2   ", "  3", "1", " 2 ", "  3  " },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_10()
    {
        // Example with qualifiers enclosing whitespace and delimiter
        String s = "\" 1 \",\"2 , 2b \",3,1,2,\"  3,3c  \"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2 , 2b", "3", "1", "2", "3,3c" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_11()
    {
        // Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
        String s = "\" 1 \", \"2 , 2b \"  ,  3,1, 2 ,\"  3,3c  \"";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 , 2b   ", "  3", "1", " 2 ", "  3,3c  " },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_12()
    {
        // Example with tab characters between delimiters
        String s = "\t1,\t2\t,3,1,\t2\t,\t3\t";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_13()
    {
        // Example with newline characters between delimiters
        String s = "\n1,\n2\n,3,1,\n2\n,\n3\n";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_14()
    {
        // Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
        String s = "\" 1 \",\"\"\"2 , 2b \"\"\",3,1,2,\"  \"\"3,3c  \"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "\"2 , 2b \"", "3", "1", "2", "\"3,3c" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_14A()
    {
        // Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
        String s = "\"\"\"1\"\"\"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "\"1\"" },
                                substrings);
    }


    [TestMethod()]
    public void SplitQualified_15()
    {
        // Instead of comma-delimited and quote-qualified, use pipe and hash

        // Example with no whitespace or qualifiers
        String s = "1|2|3|1|2,2f|3";
        var substrings = s.SplitQualified('|', '#', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2,2f", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_16()
    {
        // Instead of comma-delimited and quote-qualified, use pipe and hash

        // Example with qualifiers enclosing whitespace and delimiter
        String s = "# 1 #|#2 | 2b #|3|1|2|#  3|3c  #";
        // whitespace should be removed
        var substrings = s.SplitQualified('|', '#', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2 | 2b", "3", "1", "2", "3|3c" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_17()
    {
        // Instead of comma-delimited and quote-qualified, use pipe and hash

        // Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
        String s = "# 1 #| #2 | 2b #  |  3|1| 2 |#  3|3c  #";
        // whitespace should be preserved
        var substrings = s.SplitQualified('|', '#', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 | 2b   ", "  3", "1", " 2 ", "  3|3c  " },
                                substrings);
    }