我试图通过一个路径递归加速所有文件夹中所有文件的总和计算。
我们选择" E:\"作为文件夹。 我现在将通过" SafeFileEnumerator"获取entrie递归文件列表。以毫秒为单位进入IEnumerable(就像魅力一样)
现在我想从这个Enumerable中的所有文件中收集所有字节的总和。 现在我通过foreach循环它们并获取FileInfo(oFileInfo.FullName).Length; - 对于每个文件。
这是有效的,但速度很慢 - 大约需要30秒。如果我通过Windows右键查找空间消耗 - Windows资源管理器中所有选定文件夹的属性我在大约6秒内得到它们(在ssd上的26千兆字节数据中约有1600个文件)
所以我的第一个想法就是通过使用线程来加速收集,但是我没有在这里获得任何加速......
没有线程的代码如下:
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
long FolderSize = 0;
IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
foreach (FileSystemInfo oFileInfo in aFiles)
{
// check if we will cancel now
if (oCancelToken.Token.IsCancellationRequested)
{
throw new OperationCanceledException();
}
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
}
return FolderSize;
}
多线程代码如下:
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
long FolderSize = 0;
int iCountTasks = 0;
IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
foreach (FileSystemInfo oFileInfo in aFiles)
{
// check if we will cancel now
if (oCancelToken.Token.IsCancellationRequested)
{
throw new OperationCanceledException();
}
if (iCountTasks < 10)
{
iCountTasks++;
Thread oThread = new Thread(delegate()
{
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
iCountTasks--;
});
oThread.Start();
continue;
}
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
}
return FolderSize;
}
有人可以请教我如何加快文件夹化计算过程吗?
亲切地问候
编辑1 (Parallel.Foreach建议 - 请参阅评论)
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
long FolderSize = 0;
ParallelOptions oParallelOptions = new ParallelOptions();
oParallelOptions.CancellationToken = oCancelToken.Token;
oParallelOptions.MaxDegreeOfParallelism = System.Environment.ProcessorCount;
IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray();
Parallel.ForEach(aFiles, oParallelOptions, oFileInfo =>
{
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
});
return FolderSize;
}
答案 0 :(得分:0)
关于SafeFileEnumerator性能的附注:
一旦你获得IEnumerable,它并不意味着你得到了整个集合,因为它是懒惰的代理。试试下面的这个片段 - 我相信你会看到性能差异(对不起,如果它没有编译 - 只是为了说明这个想法):
var tmp = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray(); // fetch all records explicitly to populate the array
IEnumerable<FileSystemInfo> aFiles = tmp;
现在想要实现的实际结果。
答案 1 :(得分:0)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.IO;
namespace ConsoleApplication3
{
class Program
{
static void Main(string[] args)
{
long size = fetchFolderSize(@"C:\Test", new CancellationTokenSource());
}
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
ParallelOptions po = new ParallelOptions();
po.CancellationToken = oCancelToken.Token;
po.MaxDegreeOfParallelism = System.Environment.ProcessorCount;
long folderSize = 0;
string[] files = Directory.GetFiles(Folder);
Parallel.ForEach<string,long>(files,
po,
() => 0,
(fileName, loop, fileSize) =>
{
fileSize = new FileInfo(fileName).Length;
po.CancellationToken.ThrowIfCancellationRequested();
return fileSize;
},
(finalResult) => Interlocked.Add(ref folderSize, finalResult)
);
string[] subdirEntries = Directory.GetDirectories(Folder);
Parallel.For<long>(0, subdirEntries.Length, () => 0, (i, loop, subtotal) =>
{
if ((File.GetAttributes(subdirEntries[i]) & FileAttributes.ReparsePoint) !=
FileAttributes.ReparsePoint)
{
subtotal += fetchFolderSize(subdirEntries[i], oCancelToken);
return subtotal;
}
return 0;
},
(finalResult) => Interlocked.Add(ref folderSize, finalResult)
);
return folderSize ;
}
}
}