读取大文本文件进行解析

时间:2011-06-23 15:03:14

标签: visual-studio-2010 buffer

我正在处理一些大小为1-2 Gig的文本文件。我不能使用传统的streamreader并决定阅读chuncks并完成我的工作。问题是我不确定何时到达文件的末尾,因为它已经在一个文件上工作了很长时间,而且我不确定我可以通过缓冲区读取多大。这是代码:

dim Buffer_Size = 30000
dim bufferread = new [Char](Buffer_Size - 1){}
dim bytesread as integer = 0
dim totalbytesread as integer = 0
dim sb as new stringbuilder
Do
   bytesread = inputfile.read(bufferread, 0 , Buffer_Size)
   sb.append(bufferread)
   totalbytesread = bytesread + totalbytesread
   if sb.length > 9999999 then
       data = sb.tostring
       if not data is nothing then
               parsingtools.load(data)
       endif
   endif
   if totalbytesread > 1000000000 then
       logs.constructlog("File almost done")
   endif
loop until inputfile.endofstream

是否有任何控件或代码可以检查文件的剩余部分?

1 个答案:

答案 0 :(得分:1)

你看过BufferedStream吗?

http://msdn.microsoft.com/en-us/library/system.io.bufferedstream%28v=VS.100%29.aspx

你可以用它包装你的流。另外,我将缓冲区大小设置为megs,而不是小到30,000。

剩下多少钱?你可以直接询问流的长度吗?

下面是我用于在流中包装缓冲流的代码片段。 (对不起,这是c#)

    private static void CopyTo(AzureBlobStore azureBlobStore,Stream src, Stream dest, string description)
    {
        if (src == null)
            throw new ArgumentNullException("src");
        if (dest == null)
            throw new ArgumentNullException("dest");

        const int bufferSize = (AzureBlobStore.BufferSizeForStreamTransfers);
        // buffering happening internally. this is just to avoid 4gig boundary and have something to show
        int readCount;
        //long bytesTransfered = 0;
        var buffer = new byte[bufferSize];
        //string totalBytes = FormatBytes(src.Length);
        while ((readCount = src.Read(buffer, 0, buffer.Length)) != 0)
        {
            if (azureBlobStore.CancelProcessing)
            {
                break;
            }
            dest.Write(buffer, 0, readCount);
            //bytesTransfered += readCount;
            //Console.WriteLine("AzureBlobStore:CopyTo:{0}:{1}  {2}", FormatBytes(bytesTransfered), totalBytes,description);
        }
    }

希望这有帮助。