我正在解析下面的csv文件
filename(hhmmss),set,code,timeofday
130052,NULL,ES,"day,dawn"
130053,"1,2",ES,"day,dawn"
130062,NULL,ES,"day,dawn"
130063,"1,2",ES,"day,dawn"
130067,"1,2",ES,"day,dawn"
我正在解析像这样的行
DataRow oDataRow = dTable.NewRow();
for (int i = 0; i < columnNames.Length; i++)
{
oDataRow[columnNames[i]] = oStreamDataValues[i] == null ? string.Empty : oStreamDataValues[i];
}
dTable.Rows.Add(oDataRow);
Q1:我注意到在那些类型的列中,在oStreamDataValues [4]值的末尾有oStreamDataValues [3]“\”day“和”“。但是,我无法找到一个好的方法来做到这一点?
Q2:另外,我有兴趣从中生成统计数据,如何创建具有按文件名hhmm分组的唯一值的行,即。 13005?
答案 0 :(得分:2)
为什么要重新发明轮子?使用可用的CSV解析器,如下所示:
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
他们也支持引用字符(以及其他内容)。以上也可以直接加载DataTable
。这是一个工作样本:
DataTable tblCSV = new DataTable("CSV");
var fileInfo = new FileInfo(fullPath);
var encoding = Encoding.Default;
int headerIndex = 0;
using (var reader = new System.IO.StreamReader(fileInfo.FullName, encoding))
{
for (int i = 0; i < headerIndex; i++)
reader.ReadLine(); // skip all lines but header+data
Char quotingCharacter = '"';
Char escapeCharacter = quotingCharacter;
Char commentCharacter = '\0'; // none
Char delimiter = ',';
using (var csv = new CsvReader(reader, true, delimiter, quotingCharacter, escapeCharacter, commentCharacter, ValueTrimmingOptions.All))
{
csv.MissingFieldAction = MissingFieldAction.ParseError;
csv.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
csv.ParseError += csv_ParseError; // the method that handles this error
csv.SkipEmptyLines = true;
try
{
// load into DataTable
tblCSV.Load(csv, LoadOption.OverwriteChanges, csvTable_FillError); // csvTable_FillError-> the method that handles this error
} catch (Exception ex)
{
// logging
throw;
}
}
}
void csv_ParseError(object sender, ParseErrorEventArgs e)
{
// if the error is that a field is missing, then skip to next line
if (e.Error is MissingFieldCsvException)
{
//Log.Write(e.Error, "--MISSING FIELD ERROR OCCURRED!" + Environment.NewLine);
e.Action = ParseErrorAction.AdvanceToNextLine;
}
else if (e.Error is MalformedCsvException)
{
//Log.Write(e.Error, "--MALFORMED CSV ERROR OCCURRED!" + Environment.NewLine);
e.Action = ParseErrorAction.AdvanceToNextLine;
}
else
{
//Log.Write(e.Error, "--UNKNOWN PARSE ERROR OCCURRED!" + Environment.NewLine);
e.Action = ParseErrorAction.AdvanceToNextLine;
}
// log
}
void csvTable_FillError(object sender, FillErrorEventArgs e)
{
// You can use the e.Errors value to determine exactly what went wrong.
if (e.Errors.GetType() == typeof(System.FormatException))
{
// log
}
// Setting e.Continue to True tells the Load
// method to continue trying. Setting it to False
// indicates that an error has occurred, and the
// Load method raises the exception that got you here.
e.Continue = true;
string errors = string.Join(Environment.NewLine, e.Errors);
// log
}
修改:根据您的第二个问题:
Q2:另外,我有兴趣从中产生统计数据,怎么可能 我创建了具有按文件名hhmm分组的唯一值的行。 13005?
您可以使用LINQ-To-DataSet
查询DataTable
,例如:
var fileNameGroups = tblCSV.AsEnumerable()
.GroupBy(r => r.Field<string>("filename(hhmmss)"));
现在每个唯一文件名有一个组,每个组包含所有行:
foreach(var fnGroup in fileNameGroups)
{
Console.WriteLine("Next File-name: {0}", fnGroup.Key);
foreach(DataRow row in fnGroup)
Console.WriteLine("Fields: {0}", string.Join(",", row.ItemArray));
}
答案 1 :(得分:0)
@ Q1:解决这个问题的好方法是正则表达式。如果您之前没有使用它们,可能需要一些阅读,但您可以在不同场合使用它们。根据我的经验,这是非常值得的。对于.NET,您可以从http://msdn.microsoft.com/en-us/library/hs600312(v=vs.110).aspx开始
一个替代的,快速和肮脏的方法有其缺点是用逗号(“,”)分割,迭代项目并检查项目是否以引号开头:
@ Q2:我建议创建一个数据结构,其中包含数据行的字段作为匹配类型的属性(例如,对于字段1 TimeSpan或DateTime,如果您可以为日期部分提供适当的值)。之后,您可以将文件内容读取到列表并查询或使用Linq聚合。