使用MSDN上的this文章,我试图搜索目录中的文件。问题是,每次执行程序时,我都会得到:
"未处理的类型' System.OutOfMemoryException'发生在mscorlib.dll"。
我尝试过其他一些选项,例如StreamReader
,但我无法让它发挥作用。这些文件很大。其中一些的范围从每个1.5-2GB,每天可能有5个或更多的文件。
此代码失败:
private static string GetFileText(string name)
{
var fileContents = string.Empty;
// If the file has been deleted since we took
// the snapshot, ignore it and return the empty string.
if (File.Exists(name))
{
fileContents = File.ReadAllText(name);
}
return fileContents;
}
任何可能发生的想法或如何在没有内存错误的情况下进行读取?
整个代码(如果您不想打开MSDN文章)
class QueryContents {
public static void Main()
{
// Modify this path as necessary.
string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";
// Take a snapshot of the file system.
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
// This method assumes that the application has discovery permissions
// for all folders under the specified path.
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);
string searchTerm = @"Visual Studio";
// Search the contents of each file.
// A regular expression created with the RegEx class
// could be used instead of the Contains method.
// queryMatchingFiles is an IEnumerable<string>.
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = GetFileText(file.FullName)
where fileText.Contains(searchTerm)
select file.FullName;
// Execute the query.
Console.WriteLine("The term \"{0}\" was found in:", searchTerm);
foreach (string filename in queryMatchingFiles)
{
Console.WriteLine(filename);
}
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
// Read the contents of the file.
static string GetFileText(string name)
{
string fileContents = String.Empty;
// If the file has been deleted since we took
// the snapshot, ignore it and return the empty string.
if (System.IO.File.Exists(name))
{
fileContents = System.IO.File.ReadAllText(name);
}
return fileContents;
}
}
答案 0 :(得分:3)
您遇到的问题是基于尝试同时加载多个千兆字节的文本。如果他们是文本文件,您可以流式传输它们,并且一次只比较一行。
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileLines = File.ReadLines(file.FullName) // lazy IEnumerable<string>
where fileLines.Any(line => line.Contains(searchTerm))
select file.FullName;
答案 1 :(得分:0)
我建议你得到一个内存不足的错误,因为编写查询的方式我相信你需要将每个文件的整个文本加载到内存中,直到整个文件才能释放任何对象set已加载。你能否在GetFileText函数中检查搜索词,然后只返回true或false?
如果这样做,文件文本至少会超出函数末尾的范围,GC可以恢复内存。如果您正在处理大文件/金额,那么实际上最好重写为流式传输功能,如果您遇到搜索字词,您可以提前退出读取,并且您不会在内存中始终需要整个文件
Previous question on finding a term in an HTML file using a stream