按修改日期的Azure存储增量副本

时间:2016-04-08 09:32:16

标签: powershell azure azure-storage azure-automation

我需要将一个存储帐户复制到另一个帐户。我创建了一个Runbook并安排它每天运行。这是增量副本。

我正在做的是

  1. 列出源存储容器中的blob
  2. 检查目标存储容器中的blob
  3. 如果目标容器中不存在blob Start-AzureStorageBlobCopy
  4. 虽然这适用于体积小的容器,但这需要很长时间,对于容量为1000万块的容器来说肯定是无效的,因为每次运行任务时我都必须经历所有这1000万个blob。 / p>

    我在documentation中没有看到它,但是有什么方法可以在powershell中使用像DateModifedSince之类的条件标题,例如Get-AzureStorageBlob -DateModifiedSince date

    我没试过,但我可以看到可以在nodejs library中使用DateModifiedSince

    无论如何,我可以使用powershell做到这一点,以便我可以使用Runbooks吗?

    编辑:

    使用AzCopy制作了一个包含700万个blob的存储帐户副本,我上传了一些新的blob并再次启动了azcopy。复制一些新上传的文件仍需要大量时间。

    AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /XN /Y

    可以立即过滤具有blob名称的blob

    例如Get-AzureStorageBlob -Blob将立即从700万条记录中返回blob

    应该可以使用其他属性过滤blob ..

2 个答案:

答案 0 :(得分:2)

I am not sure if this would be the actual correct answer but I have resorted to this solution for now.

AzCopy is a bit faster but since it's executable I have no option to use it in Automation.

I wrote my own runbook (can be modified as workflow) which implements following AzCopy command

AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /Y

  1. Looking at List blobs we can only fiter blobs by blob prefix. So I cannot pull blobs filtered by Modified date. This leaves me to pull the whole blob list.
  2. I pull 20,000 blobs each time from source and destination Get-AzureStorageBlob with ContinuationToken
  3. Loop through pulled 20,000 source blobs and see if they do not exist in destination or have been modified in source
  4. If 2 is true then I write those blobs to the destination
  5. It takes around 3-4 hours to go through 7 million blobs. Task would prolong depending on how many blobs are to be written to the destination.

A code snippet

    #loop throught the source container blobs, 
    # and copy the blob to destination that are not already there
    $MaxReturn = 20000
    $Total = 0
    $Token = $null
    $FilesTransferred = 0;
    $FilesTransferSuccess = 0;
    $FilesTransferFail = 0;
    $sw = [Diagnostics.Stopwatch]::StartNew();
    DO
    {
        $SrcBlobs = Get-AzureStorageBlob -Context $sourceContext -Container $container -MaxCount $MaxReturn  -ContinuationToken $Token | 
            Select-Object -Property Name, LastModified, ContinuationToken

        $DestBlobsHash = @{}
        Get-AzureStorageBlob -Context $destContext -Container $container -MaxCount $MaxReturn  -ContinuationToken $Token  | 
            Select-Object -Property Name, LastModified, ContinuationToken  | 
                ForEach { $DestBlobsHash[$_.Name] = $_.LastModified.UtcDateTime }


        $Total += $SrcBlobs.Count

        if($SrcBlobs.Length -le 0) { 
            Break;
        }
        $Token = $SrcBlobs[$SrcBlobs.Count -1].ContinuationToken;

        ForEach ($SrcBlob in $SrcBlobs){
            # search  in destination blobs for the source blob and unmodified, if found copy it
            $CopyThisBlob = $false

            if(!$DestBlobsHash.count -ne 0){
                $CopyThisBlob = $true
            } elseif(!$DestBlobsHash.ContainsKey($SrcBlob.Name)){
                $CopyThisBlob = $true
            } elseif($SrcBlob.LastModified.UtcDateTime -gt $DestBlobsHash.Item($SrcBlob.Name)){
                $CopyThisBlob = $true
            }

            if($CopyThisBlob){
                #Start copying the blobs to container
                $blobToCopy = $SrcBlob.Name
                "Copying blob: $blobToCopy to destination"
                $FilesTransferred++
                try {
                    $c = Start-AzureStorageBlobCopy -SrcContainer $container -SrcBlob $blobToCopy  -DestContainer $container -DestBlob $blobToCopy -SrcContext $sourceContext -DestContext $destContext -Force
                    $FilesTransferSuccess++
                } catch {
                    Write-Error "$blobToCopy transfer failed"
                    $FilesTransferFail++
                }   
            }           
        }
    }
    While ($Token -ne $Null)
    $sw.Stop()
    "Total blobs in container $container : $Total"
    "Total files transferred: $FilesTransferred"
    "Transfer successfully: $FilesTransferSuccess"
    "Transfer failed: $FilesTransferFail"
    "Elapsed time: $($sw.Elapsed) `n"

答案 1 :(得分:0)

最后修改后存储在iCloudBlob对象中,您可以使用Powershell访问它,就像这样

$blob = Get-AzureStorageBlob -Context $Context  -Container $container
$blob[1].ICloudBlob.Properties.LastModified

哪个会给你

  

DateTime:31/03/2016 17:03:07
  UtcDateTime:31/03/2016 17:03:07
  LocalDateTime:31/03/2016 18:03:07
  日期:31/03/2016 00:00:00
  日:31
  DayOfWeek:周四   DayOfYear:91
  小时:17
  毫秒:0
  分钟:3
  月:3
  偏移量:00:00:00
  第二:7
  蜱虫:635950405870000000
  UtcTicks:635950405870000000
  TimeOfDay:17:03:07
  年份:2016

读完API我不认为可以使用除name之外的任何参数对容器执行搜索。我只能想象nodejs库仍然检索所有blob然后过滤它们。

我会稍微深入研究一下