使用FileStream.Seek

时间:2011-03-05 03:17:30

标签: c# file search filestream seek

我正在尝试使用FileStream.Seek快速跳转到一行并阅读它。

但是,我没有得到正确的结果。我试着看了一会儿,却无法理解我做错了什么。

环境:
操作系统:Windows 7
框架:.NET 4.0
IDE:Visual C#Express 2010

文件位置中的示例数据:C:\ Temp \ Temp.txt

0001|100!2500
0002|100!2500
0003|100!2500
0004|100!2500
0005|100!2500
0006|100!2500
0007|100!2500
0008|100!2500
0009|100!2500
0010|100!2500

代码:

class PaddedFileSearch
{
    private int LineLength { get; set; }
    private string FileName { get; set; }

    public PaddedFileSearch()
    {
        FileName = @"C:\Temp\Temp.txt";     // This is a padded file.  All lines are of the same length.

        FindLineLength();
        Debug.Print("File Line length: {0}", LineLength);

        // TODO: This purely for testing.  Move this code out.
        SeekMethod(new int[] { 5, 3, 4 });
        /*  Expected Results:
         *  Line No     Position        Line
         *  -------     --------        -----------------
         *  3           30              0003|100!2500
         *  4           15              0004|100!2500
         *  5           15              0005|100!2500 -- This was updated after the initial request.
         */

        /* THIS DOES NOT GIVE THE EXPECTED RESULTS */
        SeekMethod(new int[] { 5, 3 });
        /*  Expected Results:
         *  Line No     Position        Line
         *  -------     --------        -----------------
         *  3           30              0003|100!2500
         *  5           30              0005|100!2500
         */
    }

    private void FindLineLength()
    {
        string line;

        // Add check for FileExists

        using (StreamReader reader = new StreamReader(FileName))
        {
            if ((line = reader.ReadLine()) != null)
            {
                LineLength = line.Length + 2;
                // The 2 is for NewLine(\r\n)
            }
        }

    }

    public void SeekMethod(int[] lineNos)
    {
        long position = 0;
        string line = null;

        Array.Sort(lineNos);

        Debug.Print("");
        Debug.Print("Line No\t\tPosition\t\tLine");
        Debug.Print("-------\t\t--------\t\t-----------------");

        using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNos)
                {
                    position = (lineNo - 1) * LineLength - position;
                    fs.Seek(position, SeekOrigin.Current);

                    if ((line = reader.ReadLine()) != null)
                    {
                        Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line);
                    }
                }
            }
        }
    }
}

我得到的输出:

File Line length: 15

Line No     Position        Line
-------     --------        -----------------
3           30              0003|100!2500
4           15              0004|100!2500
5           45              0005|100!2500

Line No     Position        Line
-------     --------        -----------------
3           30              0003|100!2500
5           30              0004|100!2500

我的问题是以下输出:

Line No     Position        Line
-------     --------        -----------------
5           30              0004|100!2500

Line的输出应为: 0005 | 100!2500

我不明白为什么会这样。

我做错了吗? 有解决方法吗? 还有更快的方法来使用诸如搜索这样的东西吗? (我正在寻找基于代码的选项和 NOT Oracle或SQL Server。为了参数,我们也可以说文件大小为1 GB。)

非常感谢任何帮助。

感谢。

更新
我在这里找到了4个很棒的答案非常感谢。

示例计时:
基于几次运行,以下是从最佳到良好的方法。即使是好的也是非常接近最好的 在包含10K行的文件中,2.28 MB。我使用所有选项搜索了相同的5000个随机行。

  1. Seek4:时间流逝:00:00:00.0398530 ms - Ritch Melton
  2. Seek3:时间流逝:00:00:00.0446072 ms - Valentin Kuzub
  3. Seek1:时间流逝:00:00:00.0538210 ms - 杰克
  4. Seek2:已过去的时间:00:00:00.0889589 ms - bitxwise
  5. 下面显示的是代码。保存代码后,您只需键入TestPaddedFileSeek.CallPaddedFileSeek();即可调用它。您还必须指定命名空间和“使用引用”。

    `

    /// <summary>
    /// This class multiple options of reading a by line number in a padded file (all lines are the same length).
    /// The idea is to quick jump to the file.
    /// Details about the discussions is available at: http://stackoverflow.com/questions/5201414/having-a-problem-while-using-filestream-seek-in-c-solved
    /// </summary>
    class PaddedFileSeek
    {
        public FileInfo File {get; private set;}
        public int LineLength { get; private set; }
    
        #region Private methods
        private static int FindLineLength(FileInfo fileInfo)
        {
            using (StreamReader reader = new StreamReader(fileInfo.FullName))
            {
                string line;
                if ((line = reader.ReadLine()) != null)
                {
                    int length = line.Length + 2;   // The 2 is for NewLine(\r\n)
                    return length;
                }
            }
            return 0;
        }
    
        private static void PrintHeader()
        {
           /*
            Debug.Print("");
            Debug.Print("Line No\t\tLine");
            Debug.Print("-------\t\t--------------------------");
           */ 
        }
    
        private static void PrintLine(int lineNo, string line)
        {
            //Debug.Print("{0}\t\t\t{1}", lineNo, line);
        }
    
        private static void PrintElapsedTime(TimeSpan elapsed)
        {
            Debug.WriteLine("Time elapsed: {0} ms", elapsed);
        }
        #endregion
    
        public PaddedFileSeek(FileInfo fileInfo)
        {
            // Possibly might have to check for FileExists
            int length = FindLineLength(fileInfo);
            //if (length == 0) throw new PaddedProgramException();
            LineLength = length;
            File = fileInfo;
        }
    
        public void CallAll(int[] lineNoArray, List<int> lineNoList)
        {
            Stopwatch sw = new Stopwatch();
    
            #region Seek1
            // Create new stopwatch
            sw.Start();
    
            Debug.Write("Seek1: ");
            // Print Header
            PrintHeader();
    
            Seek1(lineNoArray);
    
            // Stop timing
            sw.Stop();
    
            // Print Elapsed Time
            PrintElapsedTime(sw.Elapsed);
    
            sw.Reset();
            #endregion
    
            #region Seek2
            // Create new stopwatch
            sw.Start();
    
            Debug.Write("Seek2: ");
            // Print Header
            PrintHeader();
    
            Seek2(lineNoArray);
    
            // Stop timing
            sw.Stop();
    
            // Print Elapsed Time
            PrintElapsedTime(sw.Elapsed);
    
            sw.Reset();
            #endregion
    
            #region Seek3
            // Create new stopwatch
            sw.Start();
    
            Debug.Write("Seek3: ");
            // Print Header
            PrintHeader();
    
            Seek3(lineNoArray);
    
            // Stop timing
            sw.Stop();
    
            // Print Elapsed Time
            PrintElapsedTime(sw.Elapsed);
    
            sw.Reset();
            #endregion
    
            #region Seek4
            // Create new stopwatch
            sw.Start();
    
            Debug.Write("Seek4: ");
    
            // Print Header
            PrintHeader();
    
            Seek4(lineNoList);
    
            // Stop timing
            sw.Stop();
    
            // Print Elapsed Time
            PrintElapsedTime(sw.Elapsed);
    
            sw.Reset();
            #endregion
    
        }
    
        /// <summary>
        /// Option by Jake
        /// </summary>
        /// <param name="lineNoArray"></param>
        public void Seek1(int[] lineNoArray)
        {
            long position = 0;
            string line = null;
    
            Array.Sort(lineNoArray);
    
            using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
            {
                using (StreamReader reader = new StreamReader(fs))
                {
                    foreach (int lineNo in lineNoArray)
                    {
                        position = (lineNo - 1) * LineLength;
                        fs.Seek(position, SeekOrigin.Begin);
    
                        if ((line = reader.ReadLine()) != null)
                        {
                            PrintLine(lineNo, line);
                        }
    
                        reader.DiscardBufferedData();
                    }
                }
            }
    
        }
    
        /// <summary>
        /// option by bitxwise
        /// </summary>
        public void Seek2(int[] lineNoArray)
        {
            string line = null;
            long step = 0;
    
            Array.Sort(lineNoArray);
    
            using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
            {
                // using (StreamReader reader = new StreamReader(fs))
                // If you put "using" here you will get WRONG results.
                // I would like to understand why this is.
                {
                    foreach (int lineNo in lineNoArray)
                    {
                        StreamReader reader = new StreamReader(fs);
                        step = (lineNo - 1) * LineLength - fs.Position;
                        fs.Position += step;
    
                        if ((line = reader.ReadLine()) != null)
                        {
                            PrintLine(lineNo, line);
                        }
                    }
                }
            }
        }
    
        /// <summary>
        /// Option by Valentin Kuzub
        /// </summary>
        /// <param name="lineNoArray"></param>
        #region Seek3
        public void Seek3(int[] lineNoArray)
        {
            long position = 0; // totalPosition = 0;
            string line = null;
            int oldLineNo = 0;
    
            Array.Sort(lineNoArray);
    
            using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
            {
                using (StreamReader reader = new StreamReader(fs))
                {
                    foreach (int lineNo in lineNoArray)
                    {
                        position = (lineNo - oldLineNo - 1) * LineLength;
                        fs.Seek(position, SeekOrigin.Current);
                        line = ReadLine(fs, LineLength);
                        PrintLine(lineNo, line);
                        oldLineNo = lineNo;
    
                    }
                }
            }
    
        }
    
        #region Required Private methods
        /// <summary>
        /// Currently only used by Seek3
        /// </summary>
        /// <param name="stream"></param>
        /// <param name="length"></param>
        /// <returns></returns>
        private static string ReadLine(FileStream stream, int length)
        {
            byte[] bytes = new byte[length];
            stream.Read(bytes, 0, length);
            return new string(Encoding.UTF8.GetChars(bytes));
        }
        #endregion
        #endregion
    
        /// <summary>
        /// Option by Ritch Melton
        /// </summary>
        /// <param name="lineNoArray"></param>
        #region Seek4
        public void Seek4(List<int> lineNoList)
        {
            lineNoList.Sort();
    
            using (var fs = new FileStream(File.FullName, FileMode.Open))
            {
                lineNoList.ForEach(ln => OutputData(fs, ln));
            }
    
        }
    
        #region Required Private methods
        private void OutputData(FileStream fs, int lineNumber)
        {
            var offset = (lineNumber - 1) * LineLength;
    
            fs.Seek(offset, SeekOrigin.Begin);
    
            var data = new byte[LineLength];
            fs.Read(data, 0, LineLength);
    
            var text = DecodeData(data);
            PrintLine(lineNumber, text);
        }
    
        private static string DecodeData(byte[] data)
        {
            var encoding = new UTF8Encoding();
            return encoding.GetString(data);
        }
    
        #endregion
    
        #endregion
    }
    
    
    
    static class TestPaddedFileSeek
    {
        public static void CallPaddedFileSeek()
        {
            const int arrayLenght = 5000;
            int[] lineNoArray = new int[arrayLenght];
            List<int> lineNoList = new List<int>();
            Random random = new Random();
            int lineNo;
            string fileName;
    
    
            fileName = @"C:\Temp\Temp.txt";
    
            PaddedFileSeek seeker = new PaddedFileSeek(new FileInfo(fileName));
    
            for (int n = 0; n < 25; n++)
            {
                Debug.Print("Loop no: {0}", n + 1);
    
                for (int i = 0; i < arrayLenght; i++)
                {
                    lineNo = random.Next(1, arrayLenght);
    
                    lineNoArray[i] = lineNo;
                    lineNoList.Add(lineNo);
                }
    
                seeker.CallAll(lineNoArray, lineNoList);
    
                lineNoList.Clear();
    
                Debug.Print("");
            }
        }
    }
    

    `

5 个答案:

答案 0 :(得分:3)

将其放在SeekMethod(int[] lineNos)

的内部循环中
position = (lineNo - 1) * LineLength;
fs.Seek(position, SeekOrigin.Begin);
reader.DiscardBufferedData();

问题是您的position变量会根据其先前的值发生变化,而StreamReader会维护一个缓冲区,因此您需要在更改流位置时清除缓冲的数据。

答案 1 :(得分:2)

我对您的预期位置感到困惑,第5行在第30和第45位,第4行在15,第3在30?

这是读逻辑的核心:

    var offset = (lineNumber - 1) * LineLength;

    fs.Seek(offset, SeekOrigin.Begin);

    var data = new byte[LineLength];
    fs.Read(data, 0, LineLength);

    var text = DecodeData(data);
    Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text);

完整的样本在这里:

class PaddedFileSearch
{
    public int LineLength { get; private set; }
    public FileInfo File { get; private set; }

    public PaddedFileSearch(FileInfo fileInfo)
    {
        var length = FindLineLength(fileInfo);
        //if (length == 0) throw new PaddedProgramException();
        LineLength = length;
        File = fileInfo;
    }

    private static int FindLineLength(FileInfo fileInfo)
    {
        using (var reader = new StreamReader(fileInfo.FullName))
        {
            string line;
            if ((line = reader.ReadLine()) != null)
            {
                var length = line.Length + 2;
                return length;
            }
        }

        return 0;
    }

    public void SeekMethod(List<int> lineNumbers)
    {

        Debug.Print("");
        Debug.Print("Line No\t\tPosition\t\tLine");
        Debug.Print("-------\t\t--------\t\t-----------------");

        lineNumbers.Sort();

        using (var fs = new FileStream(File.FullName, FileMode.Open))
        {
            lineNumbers.ForEach(ln => OutputData(fs, ln));
        }
    }

    private void OutputData(FileStream fs, int lineNumber)
    {
        var offset = (lineNumber - 1) * LineLength;

        fs.Seek(offset, SeekOrigin.Begin);

        var data = new byte[LineLength];
        fs.Read(data, 0, LineLength);

        var text = DecodeData(data);
        Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text);
    }

    private static string DecodeData(byte[] data)
    {
        var encoding = new UTF8Encoding();
        return encoding.GetString(data);
    }
}

class Program
{
    static void Main(string[] args)
    {
        var seeker = new PaddedFileSearch(new FileInfo(@"D:\Desktop\Test.txt"));

        Debug.Print("File Line length: {0}", seeker.LineLength);

        seeker.SeekMethod(new List<int> { 5, 3, 4 });
        seeker.SeekMethod(new List<int> { 5, 3 });
    }
}

答案 2 :(得分:1)

对于第一个lineno和相对于更多lineno的

,你的位置非常恶劣

仔细观察这里和实际结果

position = (lineNo - 1) * LineLength - position;
fs.Seek(position, SeekOrigin.Current);

对于值3,4,5,你会得到数字30,15,45,而显而易见的是,如果你使用相对位置它应该是30,15,15,因为线长是15 OR 30,0,0如果你的read方法执行SEEK作为副作用,就像filestream.Read那样。并且你的测试输出是非常正确的(仅用于字符串值,而不是位置),你应该使用不是序列进行测试并查看位置值更接近于看到没有与显示的字符串和位置值的连接。

实际上您的StreamReader忽略了进一步的fs.Seek次调用,而只是逐行阅读=)

以下是3 5 9输入的结果:)

Line No         Position                Line
-------         --------                -----------------
3                       30                              0003|100!2500
5                       30                              0004|100!2500
9                       90                              0005|100!2500

我相信以下内容与您尝试实现的内容最接近,这是一项新功能

private static string ReadLine(FileStream stream, int length)
        {
             byte[] bytes= new byte[length];
             stream.Read(bytes, 0, length);
             return new string(Encoding.UTF8.GetChars(bytes));  
        }

新的循环代码

int oldLine = 0;
    using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
    {
            foreach (int lineNo in lineNos)
            {
                position = (lineNo - oldLine -1) * LineLength;
                fs.Seek(position, SeekOrigin.Current);
                line = ReadLine(fs, LineLength);
                Console.WriteLine("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line);
                oldLine = lineNo;
            }
    }

请注意,现在stream.Read功能相当于额外的stream.Seek (Length)

正确输出和逻辑位置更改

Line No         Position                Line
-------         --------                -----------------
3                       30                              0003|100!2500    
4                       0                               0004|100!2500    
5                       0                               0005|100!2500

Line No         Position                Line
-------         --------                -----------------
3                       30                              0003|100!2500  
5                       15                              0005|100!2500

PS:你认为001:行是第一行而不是第0行是如此奇怪..如果使用程序员计数方法,可以删除整个-1 =)

答案 3 :(得分:1)

我不会说问题是你手动管理位置值的努力,而是StreamReader.ReadLine改变了流的位置值。如果您单步调试代码并监控本地值,您将在每次ReadLine调用后看到流的位置发生变化(在第一次调用之后为148)。

修改

最好直接更改流的位置而不是使用Seek

public void SeekMethod(int[] lineNos)
{
    string line = null;
    long step;

    Array.Sort(lineNos);

    Debug.Print("");
    Debug.Print("Line No\t\tPosition\t\tLine");
    Debug.Print("-------\t\t--------\t\t-----------------");

    using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
    {
        foreach (int lineNo in lineNos)
        {
            StreamReader reader = new StreamReader(fs);
            step = (lineNo - 1) * LineLength - fs.Position;
            fs.Position += step;

            if ((line = reader.ReadLine()) != null) {
                Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, step, line);
            }
        }
    }
}

答案 4 :(得分:0)

问题是你手动跟踪位置,但没有考虑到你读完那条线之后实际文件位置会更远一行的事实。所以你需要减去额外的读数---但只有在实际发生时才会这样做。

如果你真的想这样做,那么不要保持position,而是获得实际的文件位置;或者从给定的行号ad计算绝对文件位置,直接在那里寻找而不是从当前文件偏移量。