在RavenDb中映射reduce,更新1

时间:2012-06-01 16:08:39

标签: c# mapreduce ravendb

根据Ayende的回答

更新1

这是我第一次进入RavenDb并试验它我写了一个小地图/ reduce,但不幸的是结果是空的?

我有大约160万个文档加载到RavenDb

文件:

public class Tick
{
    public DateTime Time;
    public decimal Ask;
    public decimal Bid;
    public double AskVolume;
    public double BidVolume;
}

并希望在特定时间段内获得最低和最高要求。

按时间收集的定义为:

var ticks = session.Query<Tick>().Where(x => x.Time > new DateTime(2012, 4, 23) && x.Time < new DateTime(2012, 4, 24, 00, 0, 0)).ToList();

这给了我90280份文件,到目前为止还不错。

但是地图/减少:

Map = rows => from row in rows 
                          select new
                          {
                              Max = row.Bid,
                              Min = row.Bid, 
                              Time = row.Time,
                              Count = 1
                          };

Reduce = results => from result in results
                                group result by new{ result.MaxBid, result.Count} into g
                                select new
                                {
                                    Max = g.Key.MaxBid,
                                    Min = g.Min(x => x.MaxBid),
                                    Time = g.Key.Time,
                                    Count = g.Sum(x => x.Count)

                                };

...

private class TickAggregationResult
{
    public decimal MaxBid { get; set; }
        public decimal MinBid { get; set; }
        public int Count { get; set; }

    }

然后我创建索引并尝试查询它:

Raven.Client.Indexes.IndexCreation.CreateIndexes(typeof(TickAggregation).Assembly, documentStore);


        var session = documentStore.OpenSession();

        var g1 = session.Query<TickAggregationResult>(typeof(TickAggregation).Name);


        var group = session.Query<Tick, TickAggregation>()
                         .Where(x => x.Time > new DateTime(2012, 4, 23) && 
                                     x.Time < new DateTime(2012, 4, 24, 00, 0, 0)
                                  )
            .Customize(x => x.WaitForNonStaleResults())
                                           .AsProjection<TickAggregationResult>();

但该小组只是空的:(

正如你所看到我尝试了两种不同的查询,我不确定区别,有人可以解释一下吗?

现在收到错误消息: enter image description here

该组仍然是空的:(

让我解释一下我在纯sql中想要完成的事情:

select min(Ask), count(*) as TickCount from Ticks 
where Time between '2012-04-23' and '2012-04-24)

1 个答案:

答案 0 :(得分:3)

不幸的是,Map / Reduce无法正常工作。好吧,至少Reduce的一部分没有。为了减少你的设置,你必须预定义特定的时间范围来分组,例如 - 每日,每周,每月等。如果你每天减少,你可以获得每天的最小/最大/数量。

有一种方法可以获得您想要的东西,但它有一些性能方面的考虑因素。基本上,您根本没有减少,但是您按时间索引,然后在转换结果时进行聚合。这与您运行第一个查询以进行过滤然后在客户端代码中聚合的情况类似。唯一的好处是聚合在服务器端完成,因此您不必将所有数据传输到客户端。

此处的性能问题是您过滤的时间范围有多大,或者更准确地说,过滤器范围内有多少项?如果它相对较小,您可以使用此方法。如果它太大,您将在服务器通过结果集时等待。

以下是一个说明此技术的示例程序:

using System;
using System.Linq;
using Raven.Client.Document;
using Raven.Client.Indexes;
using Raven.Client.Linq;

namespace ConsoleApplication1
{
  public class Tick
  {
    public string Id { get; set; }
    public DateTime Time { get; set; }
    public decimal Bid { get; set; }
  }

  /// <summary>
  /// This index is a true map/reduce, but its totals are for all time.
  /// You can't filter it by time range.
  /// </summary>
  class Ticks_Aggregate : AbstractIndexCreationTask<Tick, Ticks_Aggregate.Result>
  {
    public class Result
    {
      public decimal Min { get; set; }
      public decimal Max { get; set; }
      public int Count { get; set; }
    }

    public Ticks_Aggregate()
    {
      Map = ticks => from tick in ticks
               select new
                    {
                      Min = tick.Bid,
                      Max = tick.Bid,
                      Count = 1
                    };

      Reduce = results => from result in results
                group result by 0
                  into g
                  select new
                         {
                           Min = g.Min(x => x.Min),
                           Max = g.Max(x => x.Max),
                           Count = g.Sum(x => x.Count)
                         };
    }
  }

  /// <summary>
  /// This index can be filtered by time range, but it does not reduce anything
  /// so it will not be performant if there are many items inside the filter.
  /// </summary>
  class Ticks_ByTime : AbstractIndexCreationTask<Tick>
  {
    public class Result
    {
      public decimal Min { get; set; }
      public decimal Max { get; set; }
      public int Count { get; set; }
    }

    public Ticks_ByTime()
    {
      Map = ticks => from tick in ticks
               select new {tick.Time};

      TransformResults = (database, ticks) =>
                 from tick in ticks
                 group tick by 0
                 into g
                 select new
                      {
                        Min = g.Min(x => x.Bid),
                        Max = g.Max(x => x.Bid),
                        Count = g.Count()
                      };
    }
  }

  class Program
  {
    private static void Main()
    {
      var documentStore = new DocumentStore { Url = "http://localhost:8080" };
      documentStore.Initialize();
      IndexCreation.CreateIndexes(typeof(Program).Assembly, documentStore);


      var today = DateTime.Today;
      var rnd = new Random();

      using (var session = documentStore.OpenSession())
      {
        // Generate 100 random ticks
        for (var i = 0; i < 100; i++)
        {
          var tick = new Tick { Time = today.AddMinutes(i), Bid = rnd.Next(100, 1000) / 100m };
          session.Store(tick);
        }

        session.SaveChanges();
      }


      using (var session = documentStore.OpenSession())
      {
        // Query items with a filter.  This will create a dynamic index.
        var fromTime = today.AddMinutes(20);
        var toTime = today.AddMinutes(80);
        var ticks = session.Query<Tick>()
          .Where(x => x.Time >= fromTime && x.Time <= toTime)
          .OrderBy(x => x.Time);

        // Ouput the results of the above query
        foreach (var tick in ticks)
          Console.WriteLine("{0} {1}", tick.Time, tick.Bid);

        // Get the aggregates for all time
        var total = session.Query<Tick, Ticks_Aggregate>()
          .As<Ticks_Aggregate.Result>()
          .Single();
        Console.WriteLine();
        Console.WriteLine("Totals");
        Console.WriteLine("Min: {0}", total.Min);
        Console.WriteLine("Max: {0}", total.Max);
        Console.WriteLine("Count: {0}", total.Count);

        // Get the aggregates with a filter
        var filtered = session.Query<Tick, Ticks_ByTime>()
          .Where(x => x.Time >= fromTime && x.Time <= toTime)
          .As<Ticks_ByTime.Result>()
          .Take(1024)  // max you can take at once
          .ToList()    // required!
          .Single();
        Console.WriteLine();
        Console.WriteLine("Filtered");
        Console.WriteLine("Min: {0}", filtered.Min);
        Console.WriteLine("Max: {0}", filtered.Max);
        Console.WriteLine("Count: {0}", filtered.Count);
      }

      Console.ReadLine();
    }
  }
}

我可以设想一个解决方案来解决聚合具有潜在大范围的时间过滤器的问题。减少必须在不同的水平上将事情分解为越来越小的时间单位。这个代码有点复杂,但我正在为自己的目的而努力。完成后,我将在www.ravendb.net的知识库中发帖。


<强>更新

我正在玩这个,并注意到最后一个查询中的两件事。

  1. 在调用single之前必须先执行ToList()才能获得完整的结果集。
  2. 即使它在服务器上运行,结果范围内的最大值也是1024,您必须指定Take(1024),否则默认值为128。由于这在服务器上运行,我没想到这一点。但我猜它是因为你通常不会在TransformResults部分进行聚合。
  3. 我已为此更新了代码。但是,除非你可以保证范围足够小以使其工作,否则我会等待我所说的更好的完整地图/减少。我在做这个工作。 :)