Question

我需要读取一个大文本文件并在每行中搜索一个字符串，每行由换行符分隔，我需要最小化I / O和RAM

我的想法是将文件分成块，所以我有两种方法：

1）使用类似的东西拆分FileStream但是我冒险将文本行减少一半，这可能会使事情变得复杂：

 using (FileStream fsSource = new FileStream("InputFiles\\1.txt", FileMode.Open, FileAccess.Read))
            {
                // Read the source file into a byte array.
                int numBytesToRead = 1024; // Your amount to read at a time
                byte[] bytes = new byte[numBytesToRead];

                int numBytesRead = 0;
                while (numBytesToRead > 0)
                {
                    // Read may return anything from 0 to numBytesToRead.
                    int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);

                    // Break when the end of the file is reached.
                    if (n == 0)
                        break;

                    //done something with the lines here.
                }
            }

2）创建一个扩展方法，将行列表拆分为较小的行列表，然后在每行中搜索单词，但我不确定这种方法如何影响I / O和RAM！

public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(this IEnumerable<TValue> values, int chunkSize)
        {
            using (var enumerator = values.GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    yield return GetChunk(enumerator, chunkSize).ToList();
                }
            }
        }

        private static IEnumerable<T> GetChunk<T>(IEnumerator<T> enumerator, int chunkSize)
        {
            do
            {
                yield return enumerator.Current;
            } while (--chunkSize > 0 && enumerator.MoveNext());
        }

我可以使用任何想法或其他方法吗？

提前致谢。

Answer 1

我认为你过分复杂了。当您想要读取文本文件时，.NET Framework有很多方法可供选择。

如果你需要处理一个大文本文件，没有比使用File.ReadLines方法更好的了，因为它不会将所有文件加载到内存中但允许你逐行工作

您可以阅读MSDN文档

使用ReadLines时，可以开始枚举集合返回整个集合之前的字符串;

foreach(string line in File.ReadLines(@"InputFiles\1.txt"))
{
    // Process your line here....
}

Answer 2

使用File.ReadLines方法，因为它会一次读取一行到内存中，你可以在那一行上执行一些逻辑。

System.gc()
Test Obj is null
Test Obj is null
Thread Interrupted

在C＃中通过块读取文本文件的最佳方法

2 个答案: