快速计算一系列日期间隔(200k记录)的平均值

时间:2017-12-29 13:20:42

标签: c# sql azure bigdata azure-cosmosdb

我目前处于拥有大数据的CosmosDB的情况。这个大数据由包含带有日期和双值的对象的文档组成。

我尝试做的是获取一个月的数据(提供约220K记录)并计算该月每天的双倍值的平均值。我现在可能拥有非常低效的代码。我正在寻找能够解决我问题的公式或更高效的代码。我无法想到解决方案,谷歌搜索也没有给我解决方案。

我目前的计算太慢了。我最好在不到一秒钟的时间内完成计算。

这是我们正在谈论的C#Web API,连接到Azure CosmosDB。这是我目前的代码:

// GET api/values/{databasename}/<Guid>
[HttpGet("{database}/{id}")]
public IActionResult GetRange(string database, string id, [FromQuery] int 
option, [FromQuery] string from, [FromQuery] string to)
{
    try
    {
        DateTime.TryParse(from, out var dateFrom);
        DateTime.TryParse(to, out var dateTo);
        var comparison = DateTime.Compare(dateFrom, dateTo);
        if (comparison >= 0)
        {
            return BadRequest("Given dates are incorrect.");
        }

        // Get the timespan for the given option.
        var timespan = TimeSpanSelectionHelper.GetTimeSpanForOption(option);

        var dataRange = _cosmosDbService.GetRange(database, id, from, to).ToList();
        var result = from kvp in dataRange
            let key = RoundToNearest(kvp.EventEnqueuedUtcTime, timespan)
            group kvp by key
            into grouping
            select new {
                grouping.Key,
                Avg = (int) grouping.Average(x => x.Temperature),
                Min = grouping.Min(x => x.Temperature),
                Max = grouping.Max(x => x.Temperature) 
            };

        var enumerable = result.ToArray();
        var min = enumerable.Select(x => new CosmosDbRangeSelectionKeyValueJsonModel(x.Key, x.Min)).ToArray();
        var avg = enumerable.Select(x => new CosmosDbRangeSelectionKeyValueJsonModel(x.Key, x.Avg)).ToArray();
        var max = enumerable.Select(x => new CosmosDbRangeSelectionKeyValueJsonModel(x.Key, x.Max)).ToArray();

        var model = new CosmosDbRangeSelectionJsonModel(min, avg, max);
        return Ok(model);
    }
    catch (DocumentClientException de)
    {
        var baseException = de.GetBaseException();
        var message = string.Format("{0} error occurred: {1}, Message: {2}", de.StatusCode, de.Message,
                                baseException.Message);
        return NotFound(message);
    }
    catch (Exception e)
    {
        var baseException = e.GetBaseException();
        var message = string.Format("Error: {0}, Message: {1}", e.Message, baseException.Message);
        return NotFound(message);
    }
}

/// <summary>
/// </summary>
/// <param name="dt"></param>
/// <param name="d"></param>
/// <returns></returns>
private static DateTime RoundToNearest(DateTime dt, TimeSpan d)
{
    var delta = dt.Ticks % d.Ticks;
    // Do we round up?
    var roundUp = delta > d.Ticks / 2;
    var offset = roundUp ? d.Ticks : 0;

    return new DateTime(dt.Ticks + offset - delta, dt.Kind);
}

编辑03-01-2018

我们必须密切注意这一点:

https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/18561901-add-group-by-support-for-aggregate-functions

0 个答案:

没有答案