Windows Azure - 清理WADLogsTable

时间:2011-08-01 22:27:06

标签: azure-storage azure-table-storage wadslogtable

我已经阅读了有关Windows Azure中的DiagnosticMonitor使用的WADLogsTable表是否会自动修剪旧日志条目的相互矛盾的信息。

我猜它没有,而是会永远长大 - 让我省钱。 :)

如果是这种情况,是否有人有一个很好的代码示例,如何手动清除此表中的旧日志条目?也许基于时间戳?我会定期从一个辅助角色运行此代码。

7 个答案:

答案 0 :(得分:9)

Windows Azure诊断程序创建的表中的数据不会自动删除。

但是,Windows Azure PowerShell Cmdlets包含专门针对此情况的cmdlet。

  

PS D:\>帮助Clear-WindowsAzureLog

     

NAME       清除-WindowsAzureLog

     

概要       从存储帐户中删除Windows Azure跟踪日志数据。

     

SYNTAX       Clear-WindowsAzureLog [-DeploymentId] [-From] [-To] [-StorageAccountName] [-StorageAccountKey] [-UseD       evelopmentStorage] [-StorageAccountCredentials] []

Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt
c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>]
[-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc
ountAndKey>] [<CommonParameters>]

您需要指定-ToUtc参数,并删除该日期之前的所有日志。

如果需要在辅助角色中对Azure执行清理任务,则可以重用C#cmdlet代码。 PowerShell Cmdlet在许可的MS Public License下发布。

基本上,只需要3个文件而没有其他外部依赖项:DiagnosticsOperationException.cs,WadTableExtensions.cs,WadTableServiceEntity.cs。

答案 1 :(得分:5)

Chriseyre2000的更新功能。这为需要删除数千条记录的情况提供了更多性能:通过PartitionKey搜索和分块逐步过程。请记住,最好选择在存储附近运行它(在云服务中)。

public static void TruncateDiagnostics(CloudStorageAccount storageAccount, 
    DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
        var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");

        var query = new TableQuery();
        var dt = startDateTime;
        while (true)
        {
            dt = stepFunction(dt);
            if (dt>finishDateTime)
                break;
            var l = dt.Ticks;
            string partitionKey =  "0" + l;
            query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
            query.Select(new string[] {});
            var items = cloudTable.ExecuteQuery(query).ToList();
            const int chunkSize = 200;
            var chunkedList = new List<List<DynamicTableEntity>>();
            int index = 0;
            while (index < items.Count)
            {
                var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
                chunkedList.Add(items.GetRange(index, count));
                index += chunkSize;
            }
            foreach (var chunk in chunkedList)
            {
                var batches = new Dictionary<string, TableBatchOperation>();
                foreach (var entity in chunk)
                {
                    var tableOperation = TableOperation.Delete(entity);
                    if (batches.ContainsKey(entity.PartitionKey))
                        batches[entity.PartitionKey].Add(tableOperation);
                    else
                        batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
                }

                foreach (var batch in batches.Values)
                    cloudTable.ExecuteBatch(batch);
            }
        }
}

答案 2 :(得分:3)

您可以根据时间戳来执行此操作,但由于需要扫描整个表格,因此效率非常低。下面是一个代码示例,它可能有助于生成分区键以防止“完整”表扫描。 http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx

答案 3 :(得分:2)

这是一个基于时间戳进行切换的解决方案。 (针对SDK 2.0测试)

它确实使用表格扫描来获取数据,但如果每天运行说一次就不会太痛苦:

    /// <summary>
    /// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
    /// </summary>
    /// <param name="storageAccount"></param>
    /// <param name="keepThreshold"></param>
    public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
    {
        try
        {

            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");

            TableQuery query = new TableQuery();
            query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
            var items = cloudTable.ExecuteQuery(query).ToList();

            Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
            foreach (var entity in items)
            {
                TableOperation tableOperation = TableOperation.Delete(entity);

                if (!batches.ContainsKey(entity.PartitionKey))
                {
                    batches.Add(entity.PartitionKey, new TableBatchOperation());
                }

                batches[entity.PartitionKey].Add(tableOperation);
            }

            foreach (var batch in batches.Values)
            {
                cloudTable.ExecuteBatch(batch);
            }

        }
        catch (Exception ex)
        {
            Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
        }
    }

答案 4 :(得分:2)

这是我使用异步操作和PartitionKey查询的@ Chriseyre2000解决方案略有不同的版本。它被设计为在我的情况下在Worker角色中连续运行。如果你要清理很多条目,这个内存可能会更容易一些。

static class LogHelper
{
    /// <summary>
    /// Periodically run a cleanup task for log data, asynchronously
    /// </summary>
    public static async void TruncateDiagnosticsAsync()
    {
        while ( true )
        {
            try
            {
                // Retrieve storage account from connection-string
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
                    CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );

                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );

                // keep a weeks worth of logs
                DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );

                // do this until we run out of items
                while ( true )
                {
                    TableQuery query = new TableQuery();
                    query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
                    var items = cloudTable.ExecuteQuery( query ).Take( 1000 );

                    if ( items.Count() == 0 )
                        break;

                    Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
                    foreach ( var entity in items )
                    {
                        TableOperation tableOperation = TableOperation.Delete( entity );

                        // need a new batch?
                        if ( !batches.ContainsKey( entity.PartitionKey ) )
                            batches.Add( entity.PartitionKey, new TableBatchOperation() );

                        // can have only 100 per batch
                        if ( batches[entity.PartitionKey].Count < 100)
                            batches[entity.PartitionKey].Add( tableOperation );
                    }

                    // execute!
                    foreach ( var batch in batches.Values )
                        await cloudTable.ExecuteBatchAsync( batch );

                    Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
                }
            }
            catch ( Exception ex )
            {
                Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
            }

            // run this once per day
            await Task.Delay( TimeSpan.FromDays( 1 ) );
        }
    }
}

要启动此过程,只需从您的辅助角色中的OnStart方法调用此方法。

// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();

答案 5 :(得分:1)

如果您不关心任何内容,只需删除该表。 Azure Diagnostics将重新创建它。

答案 6 :(得分:0)

稍微更新了Chriseyre2000的代码:

  • 使用ExecuteQuerySegmented而不是ExecuteQuery

  • 观察TableBatchOperation限制为100个操作

  • 清除所有Azure表

    public static void TruncateAllAzureTables(CloudStorageAccount storageAccount, DateTime keepThreshold)
    {
       TruncateAzureTable(storageAccount, "WADLogsTable", keepThreshold);
       TruncateAzureTable(storageAccount, "WADCrashDump", keepThreshold);
       TruncateAzureTable(storageAccount, "WADDiagnosticInfrastructureLogsTable", keepThreshold);
       TruncateAzureTable(storageAccount, "WADPerformanceCountersTable", keepThreshold);
       TruncateAzureTable(storageAccount, "WADWindowsEventLogsTable", keepThreshold);
    }
    
    public static void TruncateAzureTable(CloudStorageAccount storageAccount, string aTableName, DateTime keepThreshold)
    {
       const int maxOperationsInBatch = 100;
       var tableClient = storageAccount.CreateCloudTableClient();
    
       var cloudTable = tableClient.GetTableReference(aTableName);
    
       var query = new TableQuery { FilterString = $"Timestamp lt datetime'{keepThreshold:yyyy-MM-ddTHH:mm:ss}'" };
       TableContinuationToken continuationToken = null;
       do
       {
          var queryResult = cloudTable.ExecuteQuerySegmented(query, continuationToken);
          continuationToken = queryResult.ContinuationToken;
    
          var items = queryResult.ToList();
          var batches = new Dictionary<string, List<TableBatchOperation>>();
          foreach (var entity in items)
          {
             var tableOperation = TableOperation.Delete(entity);
    
             if (!batches.TryGetValue(entity.PartitionKey, out var batchOperationList))
             {
                batchOperationList = new List<TableBatchOperation>();
                batches.Add(entity.PartitionKey, batchOperationList);
             }
    
             var batchOperation = batchOperationList.FirstOrDefault(bo => bo.Count < maxOperationsInBatch);
             if (batchOperation == null)
             {
                batchOperation = new TableBatchOperation();
                batchOperationList.Add(batchOperation);
             }
             batchOperation.Add(tableOperation);
          }
    
          foreach (var batch in batches.Values.SelectMany(l => l))
          {
             cloudTable.ExecuteBatch(batch);
          }
       } while (continuationToken != null);
    }