我正在使用块blob使用Azure存储客户端在Azure blob存储中附加时间序列数据。我现在想要更新现有blob的内容。文件大小可能高达800MB。
有没有办法根据blockId以块的形式下载blob,更改内容并上传该blockId的内容?
答案 0 :(得分:1)
有没有办法在块中基于blockId下载blob,更改 内容并上传该blockId的内容?
AFAIK,我认为目前无法使用现有的API。当前API仅为您提供块ID和块的大小。为此,您需要在某个地方存储块的元数据(如块ID,开始/结束字节范围)。
一种可能的解决方案(只是大声思考)就是利用blob的元数据来存储这个块的元数据。您可以读取元数据,获取要下载的字节范围,下载数据,修改数据然后再上传。再次上传时,您需要调整有关块的元数据。但是,元数据大小(8K字节)再次受到限制。
答案 1 :(得分:0)
您可以使用.NET库Microsoft.WindowsAzure.Storage
和Microsoft.WindowsAzure.Storage.Blob
说您想从很大的csv文件中删除标题行,但只下载并上传第一个块:
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
namespace RemoveHeaderRow
{
class Program
{
static async Task Main(string[] args)
{
var storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net");
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference("containerName");
var blockBlob = container.GetBlockBlobReference("blobName.csv");
var blockList = await blockBlob.DownloadBlockListAsync();
if (blockList.Count() == 0)
{
// not all blocks have a blocklist, here's why: https://stackoverflow.com/questions/14652172/azure-blobs-block-list-is-empty-but-blob-is-not-empty-how-can-this-be
return; // cannot proceed
}
var firstBlock = blockList.First();
// download block
var contents = await GetBlockBlobContents(blockBlob, firstBlock);
// remove first line
var noHeaderContents = string.Join("\n", contents.Split("\n").Skip(1));
// upload block back to azure
await UpdateBlockBlobContent(blockBlob, firstBlock, noHeaderContents);
// commit the blocks, all blocks need to be committed, not just the updated one
await blockBlob.PutBlockListAsync(blockList.Select(b => b.Name));
}
public static async Task<string> GetBlockBlobContents(CloudBlockBlob blockBlob, ListBlockItem blockItem)
{
using (var memStream = new MemoryStream())
using (var streamReader = new StreamReader(memStream))
{
await blockBlob.DownloadRangeToStreamAsync(memStream, 0, blockItem.Length);
memStream.Position = 0;
return await streamReader.ReadToEndAsync();
}
}
public static async Task UpdateBlockBlobContent(CloudBlockBlob blockBlob, ListBlockItem blockItem, string contents)
{
using (var stream = new MemoryStream())
using (var writer = new StreamWriter(stream))
{
writer.Write(contents);
writer.Flush();
stream.Position = 0;
await blockBlob.PutBlockAsync(blockItem.Name, stream, null);
}
}
}
}