覆盖Azure Blob存储中现有内容的内容

时间:2018-01-18 09:01:49

标签: azure azure-storage-blobs

我正在使用块blob使用Azure存储客户端在Azure blob存储中附加时间序列数据。我现在想要更新现有blob的内容。文件大小可能高达800MB。

有没有办法根据blockId以块的形式下载blob,更改内容并上传该blockId的内容?

2 个答案:

答案 0 :(得分:1)

  

有没有办法在块中基于blockId下载blob,更改   内容并上传该blockId的内容?

AFAIK,我认为目前无法使用现有的API。当前API仅为您提供块ID和块的大小。为此,您需要在某个地方存储块的元数据(如块ID,开始/结束字节范围)。

一种可能的解决方案(只是大声思考)就是利用blob的元数据来存储这个块的元数据。您可以读取元数据,获取要下载的字节范围,下载数据,修改数据然后再上传。再次上传时,您需要调整有关块的元数据。但是,元数据大小(8K字节)再次受到限制。

答案 1 :(得分:0)

您可以使用.NET库Microsoft.WindowsAzure.StorageMicrosoft.WindowsAzure.Storage.Blob

说您想从很大的csv文件中删除标题行,但只下载并上传第一个块:

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

namespace RemoveHeaderRow
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net");
            var client = storageAccount.CreateCloudBlobClient();
            var container = client.GetContainerReference("containerName");
            var blockBlob = container.GetBlockBlobReference("blobName.csv");

            var blockList = await blockBlob.DownloadBlockListAsync();
            if (blockList.Count() == 0)
            {
                // not all blocks have a blocklist, here's why: https://stackoverflow.com/questions/14652172/azure-blobs-block-list-is-empty-but-blob-is-not-empty-how-can-this-be
                return; // cannot proceed
            }
            var firstBlock = blockList.First();

            //  download block
            var contents = await GetBlockBlobContents(blockBlob, firstBlock);

            //  remove first line
            var noHeaderContents = string.Join("\n", contents.Split("\n").Skip(1));

            //  upload block back to azure
            await UpdateBlockBlobContent(blockBlob, firstBlock, noHeaderContents);

            //  commit the blocks, all blocks need to be committed, not just the updated one
            await blockBlob.PutBlockListAsync(blockList.Select(b => b.Name));
        }

        public static async Task<string> GetBlockBlobContents(CloudBlockBlob blockBlob, ListBlockItem blockItem)
        {
            using (var memStream = new MemoryStream())
            using (var streamReader = new StreamReader(memStream))
            {
                await blockBlob.DownloadRangeToStreamAsync(memStream, 0, blockItem.Length);
                memStream.Position = 0;
                return await streamReader.ReadToEndAsync();
            }
        }

        public static async Task UpdateBlockBlobContent(CloudBlockBlob blockBlob, ListBlockItem blockItem, string contents)
        {
            using (var stream = new MemoryStream())
            using (var writer = new StreamWriter(stream))
            {
                writer.Write(contents);
                writer.Flush();
                stream.Position = 0;
                await blockBlob.PutBlockAsync(blockItem.Name, stream, null);
            }
        }
    }
}