大家好我有一个函数create multiple CSV files
来自DataTable
smaller chunks based on size
通过app.config
键/值对。
以下代码的问题:
20
时,它应该创建20kb
的csv文件。目前,它为相同的值创建了5kb
的文件大小。请帮我解决这个问题。谢谢!
代码:
public static void CreateCSVFile(DataTable dt, string CSVFileName)
{
int size = Int32.Parse(ConfigurationManager.AppSettings["FileSize"]);
size *= 1024; //1 KB size
string CSVPath = ConfigurationManager.AppSettings["CSVPath"];
StringBuilder FirstLine = new StringBuilder();
StringBuilder records = new StringBuilder();
int num = 0;
int length = 0;
IEnumerable<string> columnNames = dt.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
FirstLine.AppendLine(string.Join(",", columnNames));
records.AppendLine(FirstLine.ToString());
length += records.ToString().Length;
foreach (DataRow row in dt.Rows)
{
//Putting field values in double quotes
IEnumerable<string> fields = row.ItemArray.Select(field =>
string.Concat("\"", field.ToString().Replace("\"", "\"\""), "\""));
records.AppendLine(string.Join(",", fields));
length += records.ToString().Length;
if (length > size)
{
//Create a new file
num++;
File.WriteAllText(CSVPath + CSVFileName + DateTime.Now.ToString("yyyyMMddHHmmss") + num.ToString("_000") + ".csv", records.ToString());
records.Clear();
length = 0;
records.AppendLine(FirstLine.ToString());
}
}
}
答案 0 :(得分:2)
使用File.ReadLines
,Linq
表示将执行deferred execution。
foreach(var line in File.ReadLines(FilePath))
{
// logic here.
}
来自MSDN
ReadLines 和 ReadAllLines 方法的不同之处如下:使用时 ReadLines,您可以先开始枚举字符串集合 整个系列归还;当您使用ReadAllLines时, 必须 在您可以访问之前等待返回整个字符串数组 数组。因此,当你使用非常大的文件时, , ReadLines可以更高效 。
现在,您可以重写您的方法,如下所示。
public static void SplitCSV(string FilePath, string FileName)
{
//Read Specified file size
int size = Int32.Parse(ConfigurationManager.AppSettings["FileSize"]);
size *= 1024 * 1024; //1 MB size
int total = 0;
int num = 0;
string FirstLine = null; // header to new file
var writer = new StreamWriter(GetFileName(FileName, num));
// Loop through all source lines
foreach (var line in File.ReadLines(FilePath))
{
if (string.IsNullOrEmpty(FirstLine)) FirstLine = line;
// Length of current line
int length = line.Length;
// See if adding this line would exceed the size threshold
if (total + length >= size)
{
// Create a new file
num++;
total = 0;
writer.Dispose();
writer = new StreamWriter(GetFileName(FileName, num));
writer.WriteLine(FirstLine);
length += FirstLine.Length;
}
// Write the line to the current file
writer.WriteLine(line);
// Add length of line in bytes to running size
total += length;
// Add size of newlines
total += Environment.NewLine.Length;
}
}
答案 1 :(得分:1)
解决方案非常简单......您不需要将所有行都放在内存中(就像在string[] arr = File.ReadAllLines(FilePath);
中那样)。
而是在输入文件上创建StreamReader
,并逐行读取行缓冲区。当缓冲区超过“阈值大小”时,将其写入磁盘并写入单个csv文件。代码应该是这样的:
using (var sr = new System.IO.StreamReader(filePath))
{
var linesBuffer = new List<string>();
while (sr.Peek() >= 0)
{
linesBuffer.Add(sr.ReadLine());
if (linesBuffer.Count > yourThreshold)
{
// TODO: implement function WriteLinesToPartialCsv
WriteLinesToPartialCsv(linesBuffer);
// Clear the buffer:
linesBuffer.Clear();
// Try forcing c# to clear the memory:
GC.Collect();
}
}
}
正如您所看到的,逐行读取流(而不是整个CSV inpunt文件,就像您的代码一样),您可以更好地控制内存。