在C#中读取大文件

时间:2016-04-12 11:48:44

标签: c# .net streamreader

我必须逐行读取4-10gb的大文件,问题是当我读取〜2gb时,.Net进程获取和OutOfMemory异常

起初我只是试图计算行数,但是我需要单独访问每一行以从中删除一些数据。

从我所看到的,每个选项都将前面的行保留在内存中,我只希望它保留当前读取的行(除非有人知道保留所有内容的技巧)

这是我尝试过的,还有类似的东西:

StreamReader reader = File.OpenText(FilePath);
while ((line = reader.ReadLine()) != null)    //This is where it errors
{
   count++;
}
reader.Close();

例外是:

Exception of type 'System.OutOfMemoryException' was thrown.
at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32  charCount)
at System.IO.StreamReader.ReadLine()
at CSV.Program.NumLines() in C:\Users\ted\Documents\Visual Studio 2015\Projects\vConnect\CSV\CSV\Program.cs:line 100
 at CSV.Program.Main(String[] args) in C:\Users\ted\Documents\Visual Studio 2015\Projects\vConnect\CSV\CSV\Program.cs:line 20
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

由于

1 个答案:

答案 0 :(得分:1)

您可以使用FileStream类中的方法:FileStream.Read和FileStream.Seek应该允许您执行所需的操作。可以在此处找到一个示例:http://www.codeproject.com/Questions/543821/ReadplusBytesplusfromplusLargeplusBinaryplusfilepl

你必须稍微修改一下,但基本上你可以从0开始,直到找到换行符,处理行,从你到达的地方开始并重复。它不会非常有效,但它会完成工作。

希望这有帮助。