C#中的CSV验证 - 确保每行具有相同数量的逗号

时间:2011-03-10 10:13:45

标签: c# asp.net csv

我希望在我的C#/ ASP.NET应用程序中实现一个相当简单的CSV检查程序 - 我的项目会自动从GridView为用户生成CSV,但我希望能够快速浏览每一行并查看它们是否具有相同的逗号数量,如果出现任何差异则抛出异常。到目前为止,我有这个,它确实有效,但我将很快描述一些问题:

 int? CommaCount = null;

StringBuilder sb = new StringBuilder();
            StringWriter sw = new StringWriter(sb);
            String Str = null;

            //This loops through all the headerrow cells and writes them to the stringbuilder
            for (int k = 0; k <= (grd.Columns.Count - 1); k++)
            {
                sw.Write(grd.HeaderRow.Cells[k].Text + ",");    
            }

            sw.WriteLine(",");


            //This loops through all the main rows and writes them to the stringbuilder
            for (int i = 0; i <= grd.Rows.Count - 1; i++)
            {
                StringBuilder RowString = new StringBuilder();
                for (int j = 0; j <= grd.Columns.Count - 1; j++)
                {
                    //We'll need to strip meaningless junk such as <br /> and &nbsp;
                    Str = grd.Rows[i].Cells[j].Text.ToString().Replace("<br />", "");
                    if (Str == "&nbsp;")
                    {
                        Str = "";
                    }

                    Str = "\"" + Str + "\"" + ",";

                    RowString.Append(Str);
                    sw.Write(Str);
                }
                sw.WriteLine();

                //The below code block ensures that each row contains the same number of commas, which is crucial
                int RowCommaCount = CheckChar(RowString.ToString(), ',');
                if (CommaCount == null)
                {
                    CommaCount = RowCommaCount;
                }
                else
                {
                    if (CommaCount!= RowCommaCount)
                    {
                        throw new Exception("CSV generated is corrupt - line " + i + " has " + RowCommaCount + " commas when it should have " + CommaCount);
                    }
                }
            }

            sw.Close();

我的CheckChar方法:

protected static int CheckChar(string Input, char CharToCheck)
    {
        int Counter = 0;
        foreach (char StringChar in Input)
        {
            if (StringChar == CharToCheck)
            {
                Counter++;
            }
        }
        return Counter;
    }

现在我的问题是,如果网格中的单元格包含逗号,我的check char方法仍会将这些作为分隔符计数,因此会返回错误。正如您在代码中看到的那样,我将所有值包含在“characters to'escape”中。在我的方法中忽略值中的逗号是多么简单?我假设我需要重写该方法很多。

4 个答案:

答案 0 :(得分:0)

您可以使用与一个项匹配的正则表达式,并计算您的行中的匹配数。这种正则表达式的一个例子如下:

var itemsRegex =
    new Regex(@"(?<=(^|[\" + separator + @"]))((?<item>[^""\" + separator +
        @"\n]*)|(?<item>""([^""]|"""")*""))(?=($|[\" + separator + @"]))");

答案 1 :(得分:0)

只需执行以下操作(假设您不希望“在您的字段内”(否则这些需要一些额外处理)):

protected static int CheckChar(string Input, char CharToCheck, char fieldDelimiter)
{
    int Counter = 0;
    bool inValue = false;
    foreach (char StringChar in Input)
    {
        if (StringChar == fieldDelimiter)
            inValue = !inValue;
        else if (!inValue && StringChar == CharToCheck)
            Counter++;
    }
    return Counter;
}

这将导致inValue在内部字段中为真。例如。将'"'作为fieldDelimiter传递,以忽略"..."之间的所有内容。请注意,这不会处理转义"(例如""\")。你必须自己添加这样的处理。

答案 2 :(得分:0)

在连接(混合)它们之前,应该检查字段(成分),而不是检查结果字符串(蛋糕)。这会让你做出改变,做一些有建设性的事情(逃避/替换)并抛出异常作为最后的手段。

通常,“。”在.csv字段中是合法的,只要引用字符串字段即可。所以内部“,”不应该是一个问题,但报价可能是。

答案 3 :(得分:0)

var rx = new Regex("^  (  ( \"[^\"]*\" )  |  (  (?!$)[^\",]  )+  |  (?<1>,)  )*  $", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
var matches = rx.Matches("Hello,World,How,Are\nYou,Today,This,Is,\"A beautiful, world\",Hi!");

for (int i = 1; i < matches.Count; i++) {
    if (matches[i].Groups[1].Captures.Count != matches[i - 1].Groups[1].Captures.Count) {
        throw new Exception();
    }
}