Question

我在二进制文件中有大量的数据记录，我想在其中搜索一些内容。有没有办法可以在文件数据上使用LINQ语句而不将所有数据都放在内存中（如List<T>）？

我有这种方法使用List<Book>：

private Book Read(long position)
{
    Book book;
    using (Stream st = File.Open(HttpContext.Current.Server.MapPath("/") + "library.majid", FileMode.OpenOrCreate, FileAccess.Read))
    {
        st.Position = position;
        using (BinaryReader reader = new BinaryReader(st))
        {
            if (!reader.ReadBoolean())
                return null;
            book = new Book()
            {
                Id = reader.ReadInt32(),
                Name = reader.ReadString(),
                Dewey = reader.ReadString()
            };
            try
            {
                book.Subject = reader.ReadString();
                book.RegDate = reader.ReadInt32();
                book.PubDate = reader.ReadInt32();
            }
                catch (EndOfStreamException) { }
            }
        }
        return book;
    }
        private List<Book> getAll( int recordLength = 100)//sorted results by Id!!
    {
        long Len;
        using (Stream st = File.Open(HttpContext.Current.Server.MapPath("/") + "library.majid", FileMode.OpenOrCreate, FileAccess.Read))
        {
            Len = st.Length;
        }
        List<Book> res = new List<Book>();
        Book ReadedBook = null;
        for (int i = 0; i < Len/100; i++)
        {
            ReadedBook = Read(i * 100);
            if (ReadedBook != null)
                res.Add(ReadedBook);
        }
        res.Sort((x, y) => x.Id.CompareTo(y.Id));
        return res;
    }

Answer 1

如果是文本文件，您可以使用File.ReadLines(filename)返回IEnumerable<string>，而不将文件加载到内存中。

请参阅http://msdn.microsoft.com/en-us/library/dd383503.aspx

ReadLines和ReadAllLines方法的不同之处如下：使用ReadLines时，可以在返回整个集合之前开始枚举字符串集合;当您使用ReadAllLines时，必须等待返回整个字符串数组才能访问该数组。因此，当您使用非常大的文件时，ReadLines可以更有效。

对于前任;

var count = File.ReadLines(somefile)
                .Where(line => line.StartsWith("something"))
                .Count();

修改

如果它是二进制文件的话？

然后你可以写一个类似的方法：

public static IEnumerable<Book> ReadBooks(string filename) { using (var f = File.Open(filename, FileMode.Open)) { using (BinaryReader rdr = new BinaryReader(f)) { Book b = new Book(); //..... yield return b; } } }

Answer 2

如果您只想搜索某些数据，可以保留方法getAll的类似实现，传递一些参数来执行搜索并返回List（或IEnumerable<T>）。这样您只需将结果项保留在内存中。

您的Read方法不会将元素保留在内存中（仅限于方法范围）。

顺便说一句，您可以将流阅读器传递给Read方法，这样您就不会为每次迭代创建新的阅读器。流“cursor”将保持最后一块读取数据的位置。

使用LINQ而不将数据放入内存

2 个答案: