CSV文件按特定大小拆分

时间:2016-04-06 08:19:38

标签: c# file csv header tableheader

大家好我有一个函数create multiple CSV files来自DataTable smaller chunks based on size通过app.config键/值对。

以下代码的问题:

  1. 我已经将文件大小硬编码为1 kb,当我传递值20时,它应该创建20kb的csv文件。目前,它为相同的值创建了5kb的文件大小。
  2. 对于最后一个左侧记录,它没有创建任何文件。
  3. 请帮我解决这个问题。谢谢!

    代码:

    public static void CreateCSVFile(DataTable dt, string CSVFileName)
        {
    
            int size = Int32.Parse(ConfigurationManager.AppSettings["FileSize"]);
            size *= 1024; //1 KB size
            string CSVPath = ConfigurationManager.AppSettings["CSVPath"];
    
            StringBuilder FirstLine = new StringBuilder();
            StringBuilder records = new StringBuilder();
    
            int num = 0;
            int length = 0;
    
            IEnumerable<string> columnNames = dt.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
            FirstLine.AppendLine(string.Join(",", columnNames));
            records.AppendLine(FirstLine.ToString());
    
            length += records.ToString().Length;
    
            foreach (DataRow row in dt.Rows)
            {
                //Putting field values in double quotes
                IEnumerable<string> fields = row.ItemArray.Select(field =>
                    string.Concat("\"", field.ToString().Replace("\"", "\"\""), "\""));
    
                records.AppendLine(string.Join(",", fields));
                length += records.ToString().Length;
    
                if (length > size)
                {
                    //Create a new file
                    num++;
                    File.WriteAllText(CSVPath + CSVFileName + DateTime.Now.ToString("yyyyMMddHHmmss") + num.ToString("_000") + ".csv", records.ToString());
                    records.Clear();
                    length = 0;
                    records.AppendLine(FirstLine.ToString());
                }
    
            }            
        }  
    

2 个答案:

答案 0 :(得分:2)

使用File.ReadLinesLinq表示将执行deferred execution

foreach(var line in File.ReadLines(FilePath))
{
   // logic here.
}

来自MSDN

  

ReadLines ReadAllLines 方法的不同之处如下:使用时   ReadLines,您可以先开始枚举字符串集合   整个系列归还;当您使用ReadAllLines时, 必须   在您可以访问之前等待返回整个字符串数组   数组。因此,当你使用非常大的文件时, ,   ReadLines可以更高效

现在,您可以重写您的方法,如下所示。

    public static void SplitCSV(string FilePath, string FileName)
    {
        //Read Specified file size
        int size = Int32.Parse(ConfigurationManager.AppSettings["FileSize"]);

        size *= 1024 * 1024;  //1 MB size

        int total = 0;
        int num = 0;
        string FirstLine = null;   // header to new file                  
        var writer = new StreamWriter(GetFileName(FileName, num));

        // Loop through all source lines
        foreach (var line in File.ReadLines(FilePath))
        {
            if (string.IsNullOrEmpty(FirstLine)) FirstLine = line;
            // Length of current line
            int length = line.Length;

            // See if adding this line would exceed the size threshold
            if (total + length >= size)
            {
                // Create a new file
                num++;
                total = 0;
                writer.Dispose();
                writer = new StreamWriter(GetFileName(FileName, num));
                writer.WriteLine(FirstLine);
                length += FirstLine.Length;
            }

            // Write the line to the current file                
            writer.WriteLine(line);

            // Add length of line in bytes to running size
            total += length;

            // Add size of newlines
            total += Environment.NewLine.Length;
        }
   }

答案 1 :(得分:1)

解决方案非常简单......您不需要将所有行都放在内存中(就像在string[] arr = File.ReadAllLines(FilePath);中那样)。

而是在输入文件上创建StreamReader,并逐行读取行缓冲区。当缓冲区超过“阈值大小”时,将其写入磁盘并写入单个csv文件。代码应该是这样的:

using (var sr = new System.IO.StreamReader(filePath))
{
    var linesBuffer = new List<string>();
    while (sr.Peek() >= 0)
    {
        linesBuffer.Add(sr.ReadLine());
        if (linesBuffer.Count > yourThreshold)
        {
            // TODO: implement function WriteLinesToPartialCsv
            WriteLinesToPartialCsv(linesBuffer);
            // Clear the buffer:
            linesBuffer.Clear();
            // Try forcing c# to clear the memory:
            GC.Collect();
        }
    }
}

正如您所看到的,逐行读取流(而不是整个CSV inpunt文件,就像您的代码一样),您可以更好地控制内存。