Question

我正在编写如下程序：

查找给定目录中具有正确扩展名的所有文件
Foreach，查找这些文件中所有出现的给定字符串
打印每一行

我想以函数的方式编写它，作为一系列生成器函数（调用yield return的东西，并且一次只返回一个项目延迟加载），所以我的代码会像这样读：

IEnumerable<string> allFiles = GetAllFiles();
IEnumerable<string> matchingFiles = GetMatches( "*.txt", allFiles );
IEnumerable<string> contents = GetFileContents( matchingFiles );
IEnumerable<string> matchingLines = GetMatchingLines( contents );

foreach( var lineText in matchingLines )
  Console.WriteLine( "Found: " + lineText );

这一切都很好，但我还想做的是在最后打印一些统计数据。像这样：

Found 233 matches in 150 matching files. Scanned 3,297 total files in 5.72s

问题是，将代码编写成如上所述的“纯函数”样式，每个项目都是延迟加载的您只知道在最终的foreach循环完成之前总共匹配了多少文件，并且因为一次只有一个项目yield，所以代码没有任何地方可以跟踪它找到了多少东西先前。如果您调用LINQ的matchingLines.Count()方法，它将重新枚举该集合！

我可以想出很多方法来解决这个问题，但所有这些方法似乎都有些难看。它让我觉得以前人们一定会做的事情，我相信会有一个很好的设计模式，它显示了这样做的最佳实践方式。

有什么想法吗？干杯

Answer 1

我想说你需要将这个过程封装到一个'Matcher'类中，你的方法会在这个类中捕获统计信息。

public class Matcher
{
  private int totalFileCount;
  private int matchedCount;
  private DateTime start;
  private int lineCount;
  private DateTime stop;

  public IEnumerable<string> Match()
  {
     return GetMatchedFiles();
     System.Console.WriteLine(string.Format(
       "Found {0} matches in {1} matching files." + 
       " {2} total files scanned in {3}.", 
       lineCount, matchedCount, 
       totalFileCount, (stop-start).ToString());
  }

  private IEnumerable<File> GetMatchedFiles(string pattern)
  {
     foreach(File file in SomeFileRetrievalMethod())
     {
        totalFileCount++;
        if (MatchPattern(pattern,file.FileName))
        {
          matchedCount++;
          yield return file;
        }
     }
  }
}

我会停在那里，因为我应该编写工作的东西，但总的想法就在那里。 “纯”功能程序的整个要点是没有副作用，这种静力学计算是副作用。

Answer 2

我可以想到两个想法

传入上下文对象并从枚举器返回（字符串+上下文） - 纯功能解决方案

使用线程本地存储为您统计（CallContext），您可以想象并支持一堆上下文。所以你会得到这样的代码。

using (var stats = DirStats.Create())
{
    IEnumerable<string> allFiles = GetAllFiles();
    IEnumerable<string> matchingFiles = GetMatches( "*.txt", allFiles );
    IEnumerable<string> contents = GetFileContents( matchingFiles );
    stats.Print()
    IEnumerable<string> matchingLines = GetMatchingLines( contents );
    stats.Print();
}

Answer 3

与其他答案类似，但采取稍微更通用的方法......

...为什么不创建一个 Decorator 类，它可以包装现有的IEnumerable实现，并在通过其他项目时计算统计信息。

这是我刚刚聚集在一起的Counter课程 - 但您也可以为其他类型的聚合创建变体。

public class Counter<T> : IEnumerable<T>
{
    public int Count { get; private set; }

    public Counter(IEnumerable<T> source)
    {
        mSource = source;
        Count = 0;
    }

    public IEnumerator<T> GetEnumerator()
    {
        foreach (var T in mSource)
        {
            Count++;
            yield return T;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        foreach (var T in mSource)
        {
            Count++;
            yield return T;
        }
    }

    private IEnumerable<T> mSource;
}

您可以创建三个Counter实例：

一个用GetAllFiles()来计算文件总数;
一个用GetMatches()来计算匹配文件的数量;和
一个用GetMatchingLines()来计算匹配行的数量。

这种方法的关键在于你没有将多个职责分层到现有的类/方法上 - GetMatchingLines()方法只处理匹配，你也不要求它跟踪统计数据。

澄清以回应Mitcham的评论：

最终代码看起来像这样：

var files = new Counter<string>( GetAllFiles());
var matchingFiles = new Counter<string>(GetMatches( "*.txt", files ));
var contents = GetFileContents( matchingFiles );
var linesFound = new Counter<string>(GetMatchingLines( contents ));

foreach( var lineText in linesFound )
    Console.WriteLine( "Found: " + lineText );

string message 
    = String.Format( 
        "Found {0} matches in {1} matching files. Scanned {2} files",
        linesFound.Count,
        matchingFiles.Count,
        files.Count);
Console.WriteLine(message);

请注意，这仍然是一种功能性方法 - 使用的变量是不可变（更像是绑定而不是变量），整体功能没有副作用。< / p>

Answer 4

如果您乐意将代码颠倒过来，您可能会对Push LINQ感兴趣。基本思想是颠倒IEnumerable<T>的“拉”模型并将其转换为带有观察者的“推”模型 - 管道的每个部分都有效地将其数据推送到任意数量的观察者（使用事件处理程序），这通常是形成管道的新部分。这为将多个聚合连接到相同数据提供了一种非常简单的方法。

有关详细信息，请参阅this blog entry。我刚刚在伦敦发表了一篇关于它的演讲 - 我的page of talks有一些示例代码链接，幻灯片，视频等。

这是一个有趣的小项目，但确实需要一点时间。

Answer 5

我拿了Bevan的代码并重新进行了重构，直到我满意为止。有趣的东西。

public class Counter
{
    public int Count { get; set; }
}

public static class CounterExtensions
{
    public static IEnumerable<T> ObserveCount<T>
      (this IEnumerable<T> source, Counter count)
    {
        foreach (T t in source)
        {
            count.Count++;
            yield return t;
        }
    }

    public static IEnumerable<T> ObserveCount<T>
      (this IEnumerable<T> source, IList<Counter> counters)
    {
        Counter c = new Counter();
        counters.Add(c);
        return source.ObserveCount(c);
    }
}


public static class CounterTest
{
    public static void Test1()
    {
        IList<Counter> counters = new List<Counter>();
  //
        IEnumerable<int> step1 =
            Enumerable.Range(0, 100).ObserveCount(counters);
  //
        IEnumerable<int> step2 =
            step1.Where(i => i % 10 == 0).ObserveCount(counters);
  //
        IEnumerable<int> step3 =
            step2.Take(3).ObserveCount(counters);
  //
        step3.ToList();
        foreach (Counter c in counters)
        {
            Console.WriteLine(c.Count);
        }
    }
}

按预期输出：21,3,3

Answer 6

假设这些函数是你自己的，我唯一能想到的就是访问者模式，传入一个抽象的访问者函数，当每个事件发生时都会调用你。例如：将ILineVisitor传递给GetFileContents（我假设将文件分成行）。 ILineVisitor将有一个像OnVisitLine（String line）的方法，然后你可以实现ILineVisitor并使其保持适当的统计数据。用ILineMatchVisitor，IFileVisitor等冲洗并重复。或者你可以使用一个带有OnVisit（）方法的IVisitor，它在每种情况下都有不同的语义。

你的功能每个人都需要带一个访客，并在适当的时候调用它的OnVisit（），这可能看起来很烦人，但至少访客可以用来做很多有趣的事情，除了你的'在这里做。事实上，你可以通过将检查OnVisitLine（String line）中的匹配的访问者传递给GetFileContents来实际避免编写GetMatchingLines。

这是你已经考虑过的丑陋事情之一吗？

用于聚合惰性列表的设计模式

6 个答案: