C#CSV解析转义双引号

时间:2015-04-27 23:41:31

标签: c# .net csv

我正在尝试解析许多字段中包含双引号和逗号的CSV文件。我无法控制CSV的格式,而不是使用“”来转义它正在使用的引号\“。文件也非常大,所以阅读和使用正则表达式对我来说不是最佳选择。

我更喜欢使用现有的库并编写一个全新的解析器。目前我正在使用CSVHelper

这是CSV数据的示例:

“ID”, “名称”, “注释” “40”,“继续”,“如果消息\”继续\“未显示重启,请通知您的教师。” “41”,“重新启动”,“如果10秒后没有出现消息”“重启”,请手动重启。“

问题是双引号没有被正确转义,并且正在被读作分隔符并将notes字段分成2个单独的字段。

这是我目前无效的代码。

DataTable csvData = new DataTable();
string csvFilePath = @"C:\Users\" + csvFileName + ".csv";

try
{
    FileInfo file = new FileInfo(csvFilePath);
    using (TextReader reader = file.OpenText())
    using (CsvReader csv = new CsvReader(reader))
    {
        csv.Configuration.Delimiter = ",";
        csv.Configuration.HasHeaderRecord = true;
        csv.Configuration.IgnoreQuotes = false; 
        csv.Configuration.TrimFields = true; 
        csv.Configuration.WillThrowOnMissingField = false;
        string[] colFields = null;
        while(csv.Read())
        {
            if (colFields == null)
            {
                colFields = csv.FieldHeaders;
                foreach (string column in colFields)
                {
                    DataColumn datacolumn = new DataColumn(column);
                    datacolumn.AllowDBNull = true;
                    csvData.Columns.Add(datacolumn);
                }
            }
            string[] fieldData = csv.CurrentRecord;

            for (int i = 0; i < fieldData.Length; i++)
            {
                if (fieldData[i] == "")
                {
                    fieldData[i] = null;
                }
            }
            csvData.Rows.Add(fieldData); 
        }
    }
}

是否有现有的库可以指定如何转义引号,还是应该编写自己的解析器?

1 个答案:

答案 0 :(得分:2)

使用非常简单的linq语句向splittrim以及最后Replace使用内容中的unescaping引号时,您可以走得很远:

DataTable csvData = new DataTable();
string csvFilePath = @"C:\Users\" + csvFileName + ".csv";
try
{
    string[] seps = { "\",", ",\"" };
    char[] quotes = { '\"', ' ' };
    string[] colFields = null;
    foreach (var line in File.ReadLines(csvFilePath))
    {
        var fields = line
            .Split(seps, StringSplitOptions.None)
            .Select(s => s.Trim(quotes).Replace("\\\"", "\""))
            .ToArray();

        if (colFields == null)
        {
            colFields = fields;
            foreach (string column in colFields)
            {
                DataColumn datacolumn = new DataColumn(column);
                datacolumn.AllowDBNull = true;
                csvData.Columns.Add(datacolumn);
            }
        }
        else
        {
            for (int i = 0; i < fields.Length; i++)
            {
                if (fields[i] == "")
                {
                    fields[i] = null;
                }
            }
            csvData.Rows.Add(fields); 
        }
    }
}

在非常简单的控制台应用程序中使用,并在“test.txt”文件中使用OP原始输入:

public static void CsvUnescapeSplit()
{
    string[] seps = { "\",", ",\"" };
    char[] quotes = { '\"', ' ' };
    foreach (var line in File.ReadLines(@"c:\temp\test.txt"))
    {
        var fields = line
            .Split(seps, StringSplitOptions.None)
            .Select(s => s.Trim(quotes).Replace("\\\"", "\""))
            .ToArray();
        foreach (var field in fields)
            Console.Write("{0} | ", field);
        Console.WriteLine();
    }
}

这会产生以下(正确的)输出:

id | name | notes |
40 | Continue | If the message "Continue" does not appear restart, and notify your instructor. |
41 | Help | If the message "Restart" does not appear after 10 seconds, manually restart. |

警告:如果您的字段分隔符包含空格,请执行以下操作:

"40" , "Continue" , "If the message \"Continue\" does not appear restart, and notify your instructor."

或者您的内容字符串在引用后直接包含逗号,如此处(在“重新启动”之后):

"41","Help","If the message \"Restart\", does not appear after 10 seconds, manually restart."

它会失败。