用于文件夹大小确定的并行循环

时间:2016-07-12 17:51:02

标签: c#

整个程序的目标是确定目录中主文件夹的大小。它适用于小型驱动器,但适用于较大的驱动器。我绝对需要一个驱动器花了3个多小时。这是我正在使用的文件夹大小调整程序的副本。

    public  double getDirectorySize(string p)
    {

        //get array of all file names
        string[] a = Directory.GetFiles(p, "*.*", SearchOption.AllDirectories);

        //calculate total bytes in loop
        double b = 0;
        foreach (string name in a)
        {

            if (name.Length < 250) // prevents path too long errors
            {


                    //use file info to get length of each file 
                    FileInfo info = new FileInfo(name);
                    b += info.Length;
            }
        }

        //return total size
        return b;
    }

所以我想到的是以并行foreach循环的形式使用并行循环。每个p代表主文件夹的名称。我想到以某种方式将路径p分成其子文件夹并使用并行的foreach循环来继续收集文件大小;但是,它们具有未知数量的子目录。这是我在尝试恢复文件夹大小时遇到​​问题的地方。感谢您的帮助

更新

我通过下面的foreach循环调用此函数

           DirectoryInfo di = new DirectoryInfo    (Browse_Folders_Text_Box.Text);
            FileInfo[] parsedfilename = di.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
            parsedfoldername = System.IO.Directory.GetDirectories(Browse_Folders_Text_Box.Text, "*.*", System.IO.SearchOption.TopDirectoryOnly);
            //parsedfilename = System.IO.Directory.GetDirectories(textBox1.Text, "*.*", System.IO.SearchOption.AllDirectories);





            // Process the list of folders found in the directory.

            type_label.Text = "Folder Names \n";


            List<string> NameList = new List<string>();
            foreach (string transfer2 in parsedfoldername)
            {

                this.Cursor = Cursors.WaitCursor;
                //Uses the path and takes the name from last folder used
                string dirName = new DirectoryInfo(@transfer2).Name;
                string dirDate = new DirectoryInfo(@transfer2).LastWriteTime.ToString();


                NameList.Add(dirName);
                //Form2 TextTable = new Form2(NameList.ToString());



                //Display_Rich_Text_Box.AppendText(dirName);
                //Display_Rich_Text_Box.AppendText("\n");
                Last_Date_Modified_Text_Box.AppendText(dirDate);
                Last_Date_Modified_Text_Box.AppendText("\n");


                try
                {
                    double b;

                    b = getDirectorySize(transfer2);
                    MetricByte(b);



                }
                catch (Exception)
                {
                    Size_Text_Box.AppendText("N/A \n");                      
                }

            }

            Display_Rich_Text_Box.Text = string.Join(Environment.NewLine, NameList);
            this.Cursor = Cursors.Default;

所以当我想到并行的foreach循环时,我想的是采用下一个实例名称(子文件夹名称),它们都在同一级别上并使用getDirectorySize()同时运行它们因为我知道那里至少是主文件夹名称下面的7个子文件夹。

4 个答案:

答案 0 :(得分:1)

并行访问同一物理驱动器不会加快工作速度。

您的主要问题是GetFiles方法。它遍历收集所有文件名的所有子文件夹。然后再次传入相同文件的循环。

请改用EnumerateFiles方法。

试试这段代码。它会快得多。

public long GetDirectorySize(string path)
{
    var dirInfo = new DirectoryInfo(path);
    long totalSize = 0;

    foreach (var fileInfo in dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories))
    {
        totalSize += fileInfo.Length;
    }
    return totalSize;
}

MSDN

  

EnumerateFiles和GetFiles方法的不同之处如下:使用EnumerateFiles时,可以在返回整个集合之前开始枚举名称集合;当您使用GetFiles时,您必须等待返回整个名称数组,然后才能访问该数组。因此,当您使用许多文件和目录时,EnumerateFiles可以更有效。

答案 1 :(得分:0)

我必须做类似的事情,但不是文件夹/文件大小。

我没有方便的代码,但我使用以下作为启动器。如果目录

中有足够的文件,它将并行执行

来自MSDN的来源:

  

以下示例按顺序迭代目录,但是   并行处理文件。这可能是最好的方法   当你有一个大的文件到目录比率。它也有可能   并行化目录迭代,并访问每个文件   顺序。并行化两个循环可能效率不高   除非你专门针对具有大量机器的机器   处理器。但是,在所有情况下,您都应该测试您的应用程序   彻底确定最佳方法。

   static void Main()
   {            
      try 
      {
         TraverseTreeParallelForEach(@"C:\Program Files", (f) =>
         {
            // Exceptions are no-ops.
            try {
               // Do nothing with the data except read it.
               byte[] data = File.ReadAllBytes(f);
            }
            catch (FileNotFoundException) {}
            catch (IOException) {}
            catch (UnauthorizedAccessException) {}
            catch (SecurityException) {}
            // Display the filename.
            Console.WriteLine(f);
         });
      }
      catch (ArgumentException) {
         Console.WriteLine(@"The directory 'C:\Program Files' does not exist.");
      }   

      // Keep the console window open.
      Console.ReadKey();
   }

   public static void TraverseTreeParallelForEach(string root, Action<string> action)
   {
      //Count of files traversed and timer for diagnostic output
      int fileCount = 0;
      var sw = Stopwatch.StartNew();

      // Determine whether to parallelize file processing on each folder based on processor count.
      int procCount = System.Environment.ProcessorCount;

      // Data structure to hold names of subfolders to be examined for files.
      Stack<string> dirs = new Stack<string>();

      if (!Directory.Exists(root)) {
             throw new ArgumentException();
      }
      dirs.Push(root);

      while (dirs.Count > 0) {
         string currentDir = dirs.Pop();
         string[] subDirs = {};
         string[] files = {};

         try {
            subDirs = Directory.GetDirectories(currentDir);
         }
         // Thrown if we do not have discovery permission on the directory.
         catch (UnauthorizedAccessException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         // Thrown if another process has deleted the directory after we retrieved its name.
         catch (DirectoryNotFoundException e) {
            Console.WriteLine(e.Message);
            continue;
         }

         try {
            files = Directory.GetFiles(currentDir);
         }
         catch (UnauthorizedAccessException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         catch (DirectoryNotFoundException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         catch (IOException e) {
            Console.WriteLine(e.Message);
            continue;
         }

         // Execute in parallel if there are enough files in the directory.
         // Otherwise, execute sequentially.Files are opened and processed
         // synchronously but this could be modified to perform async I/O.
         try {
            if (files.Length < procCount) {
               foreach (var file in files) {
                  action(file);
                  fileCount++;                            
               }
            }
            else {
               Parallel.ForEach(files, () => 0, (file, loopState, localCount) =>
                                            { action(file);
                                              return (int) ++localCount;
                                            },
                                (c) => {
                                          Interlocked.Add(ref fileCount, c);                          
                                });
            }
         }
         catch (AggregateException ae) {
            ae.Handle((ex) => {
                         if (ex is UnauthorizedAccessException) {
                            // Here we just output a message and go on.
                            Console.WriteLine(ex.Message);
                            return true;
                         }
                         // Handle other exceptions here if necessary...

                         return false;
            });
         }

         // Push the subdirectories onto the stack for traversal.
         // This could also be done before handing the files.
         foreach (string str in subDirs)
            dirs.Push(str);
      }

      // For diagnostic purposes.
      Console.WriteLine("Processed {0} files in {1} milleseconds", fileCount, sw.ElapsedMilliseconds);
   }

答案 2 :(得分:0)

不幸的是,没有隐藏的托管或Win32 API可以让你获得磁盘上文件夹的大小而不通过它递归,否则Windows资源管理器肯定会利用它。

这是一个示例方法,它可以将您可以与标准非并行递归函数进行比较的工作并行化,以实现相同的目标:

private static long GetFolderSize(string sourceDir)
{
    long size = 0;
    string[] fileEntries = Directory.GetFiles(sourceDir);

    foreach (string fileName in fileEntries)
    {
        Interlocked.Add(ref size, (new FileInfo(fileName)).Length);
    }

    var subFolders = Directory.EnumerateDirectories(sourceDir);

    var tasks = subFolders.Select(folder => Task.Factory.StartNew(() =>
    {
        if ((File.GetAttributes(folder) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
        {
            Interlocked.Add(ref size, (GetFolderSize(folder)));
            return size;
        }
        return 0;
    }));

    Task.WaitAll(tasks.ToArray());

    return size;
}

除非单个文件夹中有数百万个文件,否则此示例不会占用大量内存。

答案 3 :(得分:0)

使用Microsoft Scripting Runtime似乎加快了约90%:

var fso = new Scripting.FileSystemObject();
double size = fso.GetFolder(path).Size;

参考:What is the fastest way to calculate a Windows folders size?