Question

方案

我需要从数据库中读取超过500万个项目并逐个处理它们，而不必将所有集合存储在内存中。让我写一个过于简单的C＃启发的伪代码来澄清（请注意问题是关于LINQ的用法，分组和计数等） -

Lets say the table has the following fields - Id, Name, Age

IList<string> resultList = ...
IDataReader reader = command.executereader...
while(reader.Read()) //Read only one item at a time, no need to load everything
    if (AggregateFunction(resultList, reader.Name, reader.Age))
        resultList.Add(reader.Name);

问题如果我使用IDataReader，我不必将所有500万个项目存储在内存中。我可以循环遍历它们，我的内存要求一次只有一行。

但是如果我使用Repository模式和IEnumerable等，那么在我处理之前，我将被迫将所有500万个项目存储在内存中。代码看起来像 -

IEnumerable<...> tableData = repository.GetAll() // Here we loaded everything in the memory
foreach(var row in tableData)
    //Do whatever...

我是否应该跳过Repository模式并以旧式方式执行此操作？或者有没有办法在不加载内存中的所有内容的情况下获得Respository模式的好处？

注意：我想到的解决方案是创建一个 repository.GetAggregatedResult（Func aggregateFunction）但那感觉不干净。另外，这里的真正问题是 - 如何一次迭代一个存储库，而不将整个结果集存储在内存中

Answer 1

我不明白为什么你不能实现这样的方法：

public interface IPersonRepository
{
     IEnumerable<string> GetFilteredNames(Func<Person, bool> predicate);
}

此外，还有一个像这样的域对象：

public class Person
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public byte Age { get; set; } 
    // byte should be fine unless you would be 
    // working with turtles instead of persons ;)
}

...并使用原始IDataReader实现实现它：

public IEnumerable<string> GetFilteredNames(Func<Person, bool> predicate)
{
    List<string> result = new List<string>();
    IDataReader dataReader = ... // Who knows how you get it!

    while(dataReader.Read()) 
    {
        Person person = new Person 
        {
            Id = (int)dataReader["Id"],
            Name = (string)dataReader["Name"],
            Age = (byte)dataReader["Age"]
        };

        if(predicate(person))
           result.Add(person.Name);
    }

    return result;    
}

如果你想让它绝对不可知，你可以在存储库上使用依赖注入来注入IDataReader工厂！

现在，您可以继续处理存储库模式的奇迹世界：

var result = repoImpl.GetFilteredNames(person => AggregateFunction(person.Id, person.Name, person.Age));

如何在不将查询结果存储在内存中的情况下实现存储库？

1 个答案: