Question

我在Google云存储上有一个很大的（> 1G）压缩文件。我想将其内容推送到Google Big Query。现在，我可以将文件下载到磁盘上，将其解压缩并将每一行作为新记录推送。我不知道有没有一种方法可以将文件保存到磁盘上。我们可以在下载文件时将文件解压缩到内存中，但不将整个文件加载到内存中吗？

我知道这与蒸汽和管道有关，但我不知道该怎么做。

以下代码可以将文件作为流下载到磁盘。

static async Task Download(string bucketName, string blobName)
{
    var storage = await GetClient();

    using (var writeStream = new FileStream(WorkDir + blobName, FileMode.Create))
    {
        await storage.DownloadObjectAsync(bucketName, blobName, writeStream);
    }
}

通过以下内容，我还可以将文件解压缩为流，而无需占用过多内存。

 static void DecompressAndWriteLines(string inputPath)
{
    using (var gzInput = new GZipInputStream(new FileStream(inputPath, FileMode.Open)))
    {
        using (var reader = new StreamReader(gzInput, Encoding.UTF8))
        {
            string line = null;
            while ((line = reader.ReadLine()) != null)
            {               
                Console.WriteLine(line);
            }
        }
    }
}

我为实现上述目标尝试了此操作。但这会将整个文件下载到MemoryStream！

static async Task DownloadAndDecompress(string bucketName, string blobName)
{
    var storage = await GetClient();

    using (var memoryStream = new MemoryStream())
    {
        await storage.DownloadObjectAsync(bucketName, blobName, memoryStream);                

        memoryStream.Seek(0, SeekOrigin.Begin);

        using (var gZipInputStream = new GZipInputStream(memoryStream))

        using (var reader = new StreamReader(gZipInputStream, Encoding.UTF8))
        {
            string line = null;
            while ((line = reader.ReadLine()) != null)
            {
                Console.WriteLine(line);
            }
        }
    }
}

谢谢

使用C＃在内存中下载并解压缩

0 个答案: