C#-通过删除不良行来拆分CSV文件

时间:2018-09-19 21:21:04

标签: c# .net csv

我有一个具有200万行的csv文件,文件大小为2 GB。但是由于有几个自由文本形式的列,它们包含冗余的CRLF并导致文件无法加载到SQL Server表中。我收到一个错误,即最后一列不以“。”结尾。

我有以下代码,但是当从fileName读取时,它给出了OutOfMemoryException。该行是:

var lines = File.ReadAllLines(fileName);

我该如何解决?理想情况下,我想将文件分为好行和坏行。或删除不以“ CRLF”结尾的行。

int goodRow = 0;
int badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
String lineOut = string.Empty;

String str = string.Empty;
var lines = File.ReadAllLines(fileName);

StringBuilder sbGood = new StringBuilder();
StringBuilder sbBad = new StringBuilder();

foreach (string line in lines)
{
     if (line.Contains(charGood))
    {
        goodRow++;
        sbGood.AppendLine(line);
    }
    else
    {
        badRow++;
        sbBad.AppendLine(line);
    }
}

if (badRow > 0)
{
    File.WriteAllText(badRowFileName, sbBad.ToString());
}
if (goodRow > 0)
{
    File.WriteAllText(goodRowFileName, sbGood.ToString());
}

sbGood.Clear();
sbBad.Clear();

msg = msg + "Good Rows - " + goodRow.ToString() + " Bad Rows - " + badRow.ToString() + " Done.";

2 个答案:

答案 0 :(得分:2)

您可以像这样更高效地翻译该代码:

int goodRow = 0, badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";

var charGood = "\"\"";

using (var lines = File.ReadLines(fileName))
using (var swGood = new StreamWriter(goodRowFileName))
using (var swBad = new StreamWriter(badRowFileName))
{    
    foreach (string line in lines)
    {
        if (line.Contains(charGood))
        {
            goodRow++;
            swGood.WriteLine(line);
        }
        else
        {
            badRow++;
            swBad.WriteLine(line);
        }
    }
}

msg += $"Good Rows: {goodRow,9}   Bad Rows: {badRow,9} Done.";

但是我还要考虑使用真正的csv解析器。 NuGet上有很多东西。甚至可以让您即时清除数据。

答案 1 :(得分:1)

我不建议将整个文件读入内存,然后处理该文件,然后将所有修改后的内容写到新文件中。

代替使用文件流:

        using (var rdr = new StreamReader(fileName))
        using (var wrtrGood = new StreamWriter(goodRowFileName))
        using (var wrtrBad = new StreamWriter(badRowFileName))
        {
            string line = null;
            while ((line = rdr.ReadLine()) != null)
            {
                if (line.Contains(charGood))
                {
                    goodRow++;
                    wrtr.WriteLine(line);
                }
                else
                {
                    badRow++;
                    wrtrBad.WriteLine(line);
                }

            }
        }