Question

我正在解析下面的csv文件

filename(hhmmss),set,code,timeofday
130052,NULL,ES,"day,dawn"
130053,"1,2",ES,"day,dawn"
130062,NULL,ES,"day,dawn"
130063,"1,2",ES,"day,dawn"
130067,"1,2",ES,"day,dawn"

我正在解析像这样的行

DataRow oDataRow = dTable.NewRow();
for (int i = 0; i < columnNames.Length; i++)
{
    oDataRow[columnNames[i]] = oStreamDataValues[i] == null ? string.Empty : oStreamDataValues[i];
}
dTable.Rows.Add(oDataRow);

Q1：我注意到在那些类型的列中，在oStreamDataValues [4]值的末尾有oStreamDataValues [3]“\”day“和”“。但是，我无法找到一个好的方法来做到这一点？

Q2：另外，我有兴趣从中生成统计数据，如何创建具有按文件名hhmm分组的唯一值的行，即。 13005？

Answer 1

为什么要重新发明轮子？使用可用的CSV解析器，如下所示：

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

他们也支持引用字符（以及其他内容）。以上也可以直接加载DataTable。这是一个工作样本：

DataTable tblCSV = new DataTable("CSV");
var fileInfo = new FileInfo(fullPath);
var encoding = Encoding.Default;
int headerIndex = 0;
using (var reader = new System.IO.StreamReader(fileInfo.FullName, encoding))
{
    for (int i = 0; i < headerIndex; i++)
        reader.ReadLine(); // skip all lines but header+data
    Char quotingCharacter = '"';
    Char escapeCharacter = quotingCharacter;
    Char commentCharacter = '\0'; // none
    Char delimiter = ',';
    using (var csv = new CsvReader(reader, true, delimiter, quotingCharacter, escapeCharacter, commentCharacter, ValueTrimmingOptions.All))
    {
        csv.MissingFieldAction = MissingFieldAction.ParseError;
        csv.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
        csv.ParseError += csv_ParseError;  // the method that handles this error
        csv.SkipEmptyLines = true;
        try
        {
            // load into DataTable
            tblCSV.Load(csv, LoadOption.OverwriteChanges, csvTable_FillError); // csvTable_FillError-> the method that handles this error
        } catch (Exception ex)
        {
            // logging 
            throw;
        }
    }
}

void csv_ParseError(object sender, ParseErrorEventArgs e)
{
    // if the error is that a field is missing, then skip to next line
    if (e.Error is MissingFieldCsvException)
    {
        //Log.Write(e.Error, "--MISSING FIELD ERROR OCCURRED!" + Environment.NewLine);
        e.Action = ParseErrorAction.AdvanceToNextLine;
    }
    else if (e.Error is MalformedCsvException)
    {
        //Log.Write(e.Error, "--MALFORMED CSV ERROR OCCURRED!" + Environment.NewLine);
        e.Action = ParseErrorAction.AdvanceToNextLine;
    }
    else
    {
        //Log.Write(e.Error, "--UNKNOWN PARSE ERROR OCCURRED!" + Environment.NewLine);
        e.Action = ParseErrorAction.AdvanceToNextLine;
    }
    // log
}

void csvTable_FillError(object sender, FillErrorEventArgs e)
{
    // You can use the e.Errors value to determine exactly what went wrong.
    if (e.Errors.GetType() == typeof(System.FormatException))
    {
        // log
    }

    // Setting e.Continue to True tells the Load
    // method to continue trying. Setting it to False
    // indicates that an error has occurred, and the 
    // Load method raises the exception that got you here.
    e.Continue = true;

    string errors = string.Join(Environment.NewLine, e.Errors);
    // log
}

修改：根据您的第二个问题：

Q2：另外，我有兴趣从中产生统计数据，怎么可能我创建了具有按文件名hhmm分组的唯一值的行。 13005？

您可以使用LINQ-To-DataSet查询DataTable，例如：

var fileNameGroups = tblCSV.AsEnumerable()
    .GroupBy(r => r.Field<string>("filename(hhmmss)"));

现在每个唯一文件名有一个组，每个组包含所有行：

foreach(var fnGroup in fileNameGroups)
{
    Console.WriteLine("Next File-name: {0}", fnGroup.Key);
    foreach(DataRow row in fnGroup)
        Console.WriteLine("Fields: {0}", string.Join(",", row.ItemArray));
}

Answer 2

@ Q1：解决这个问题的好方法是正则表达式。如果您之前没有使用它们，可能需要一些阅读，但您可以在不同场合使用它们。根据我的经验，这是非常值得的。对于.NET，您可以从http://msdn.microsoft.com/en-us/library/hs600312(v=vs.110).aspx开始
一个替代的，快速和肮脏的方法有其缺点是用逗号（“，”）分割，迭代项目并检查项目是否以引号开头：

如果它不是以引号开头，则为单个字段。
如果以引号开头，请将此及所有后续项添加到字段中，直到以引号结尾。从字段中删除引号。

@ Q2：我建议创建一个数据结构，其中包含数据行的字段作为匹配类型的属性（例如，对于字段1 TimeSpan或DateTime，如果您可以为日期部分提供适当的值）。之后，您可以将文件内容读取到列表并查询或使用Linq聚合。

使用1列中的两个值解析csv

2 个答案: