RavenDB:为什么我在这个multi-map / reduce索引中得到字段的空值?

时间:2017-01-10 01:27:18

标签: mapreduce ravendb

受Ayende的文章https://ayende.com/blog/89089/ravendb-multi-maps-reduce-indexes的启发,我有以下索引,其作用如下:

public class Posts_WithViewCountByUser : AbstractMultiMapIndexCreationTask<Posts_WithViewCountByUser.Result>
{
    public Posts_WithViewCountByUser()
    {
        AddMap<Post>(posts => from p in posts
            select new
            {
                ViewedByUserId = (string) null,
                ViewCount = 0,

                Id = p.Id,
                PostTitle = p.PostTitle,
            });

        AddMap<PostView>(postViews => from postView in postViews
            select new
            {
                ViewedByUserId = postView.ViewedByUserId,
                ViewCount = 1,

                Id = (string) postView.PostId,
                PostTitle = (string) null,
            });

        Reduce = results => from result in results
            group result by new
            {
                result.Id,
                result.ViewedByUserId
            }
            into g
            select new Result
            {
                ViewCount = g.Sum(x => x.ViewCount),
                Id = g.Key.Id,
                ViewedByUserId = g.Key.ViewedByUserId,
                PostTitle = g.Select(x => x.PostTitle).Where(x => x != null).FirstOrDefault(),
            };

        Store(x => x.PostTitle, FieldStorage.Yes);
    }

    public class Result
    {
        public string Id { get; set; }
        public string ViewedByUserId { get; set; }
        public int ViewCount { get; set; }
        public string PostTitle { get; set; }
    }
}

我想像这样查询这个索引:

返回所有帖子,包括 - 对于给定用户 - 用户查看帖子的次数的整数。 &#34;观点&#34;存储在单独的文档类型PostView中。请注意,我的真实文档类型已在此处重命名以匹配文章中的示例(我当然不会实现&#34;最常见的&#34;这种方式)。

我得到的查询结果是正确的 - 即我总是得到所有Post文档,其中包含正确的用户查看次数。但我的问题是, PostTitle 字段在结果集中始终为空(所有Post文档在数据集中都具有非空值)。

我将 userId 和(post) Id 的组合分组为我的&#34;唯一性&#34;。我理解它的方式(如果我错了请纠正我)是,在reduce的这一点上,我有一堆伪文件与 userId / postId 组合,其中一些来自Post地图,另一些来自PostView地图。现在我只找到那些实际上具有 PostTitle 值的任何单个伪文档 - 即一个源自Post映射的伪文档。这些都显然具有相同的价值,因为它是同一个帖子,只是&#34;外部加入&#34;。 .Select(....).Where(....).FirstOrDefault()链取自我用作基础的示例。然后我为我的最终文档设置了 ViewCount 值,我将其投影到结果中。

我的问题是:如何在结果中获取 PostTitle 字段的非空值?

2 个答案:

答案 0 :(得分:2)

问题是你有:

       ViewedByUserId = (string) null,

        group result by new
        {
            result.Id,
            result.ViewedByUserId
        }
        into g

换句话说,你实际上是按null分组,我假设这不是你的意图。

PostView上设置map / reduce索引并从include或变换器获取PostTitle会简单得多。

您理解正在发生的事情是正确的,因为您正在使用userId / postId创建索引结果。

Buit你实际做的是从[{1}} PostViewuserId /postId Post创建结果。

这就是为什么你没有你想要的比赛。

答案 1 :(得分:2)

索引中的分组不正确。使用以下示例数据:

new Post { Id = "Post-1", PostTitle = "Post Title", AuthorId = "Author-1" }
new PostView { ViewedByUserId = "User-1", PostId = "Post-1" }
new PostView { ViewedByUserId = "User-1", PostId = "Post-1" }
new PostView { ViewedByUserId = "User-2", PostId = "Post-1" }

索引结果如下:

ViewCount | Id     | ViewedByUserId | PostTitle
--------- | ------ | -------------- | ----------
 0        | Post-1 | null           | Post Title
 2        | Post-1 | User-1         | null
 1        | Post-1 | User-2         | null

索引中的映射操作只是为所有源文档创建一个公共文档。因此,Post-1文档生成一行,Post-1User-1的两个文档生成两行(后来用ViewCount == 2缩减为单行)并且Post-1User-2的文档生成最后一行。

reduce操作将所有映射的行分组并在索引中生成结果文档。在这种情况下,Post源文档与PostView源文档分开存储,因为null中的ViewedByUserId值未与PostView中的任何文档分组{1}}收集。

如果您可以更改存储数据的方式,则可以通过直接在PostView中存储视图数来解决此问题。它将大大减少数据库中的重复数据,同时在更新视图计数时具有几乎相同的成本。

完成测试(需要xunit和RavenDB.Tests.Helpers nugets):

using Raven.Abstractions.Indexing;
using Raven.Client;
using Raven.Client.Indexes;
using Raven.Tests.Helpers;
using System.Linq;
using Xunit;

namespace SO41559770Answer
{
    public class SO41559770 : RavenTestBase
    {
        [Fact]
        public void SO41559770Test()
        {
            using (var server = GetNewServer())
            using (var store = NewRemoteDocumentStore(ravenDbServer: server))
            {
                new PostViewsIndex().Execute(store);

                using (IDocumentSession session = store.OpenSession())
                {
                    session.Store(new Post { Id = "Post-1", PostTitle = "Post Title", AuthorId = "Author-1" });
                    session.Store(new PostView { Id = "Views-1-1", ViewedByUserId = "User-1", PostId = "Post-1", ViewCount = 2 });
                    session.Store(new PostView { Id = "Views-1-2", ViewedByUserId = "User-2", PostId = "Post-1", ViewCount = 1 });
                    session.SaveChanges();
                }

                WaitForAllRequestsToComplete(server);
                WaitForIndexing(store);

                using (IDocumentSession session = store.OpenSession())
                {
                    var resultsForId1 = session
                        .Query<PostViewsIndex.Result, PostViewsIndex>()
                        .ProjectFromIndexFieldsInto<PostViewsIndex.Result>()
                        .Where(x => x.PostId == "Post-1" && x.UserId == "User-1");
                    Assert.Equal(2, resultsForId1.First().ViewCount);
                    Assert.Equal("Post Title", resultsForId1.First().PostTitle);
                    var resultsForId2 = session
                        .Query<PostViewsIndex.Result, PostViewsIndex>()
                        .ProjectFromIndexFieldsInto<PostViewsIndex.Result>()
                        .Where(x => x.PostId == "Post-1" && x.UserId == "User-2");
                    Assert.Equal(1, resultsForId2.First().ViewCount);
                    Assert.Equal("Post Title", resultsForId2.First().PostTitle);
                }
            }
        }
    }

    public class PostViewsIndex : AbstractIndexCreationTask<PostView, PostViewsIndex.Result>
    {
        public PostViewsIndex()
        {
            Map = postViews => from postView in postViews
                               let post = LoadDocument<Post>(postView.PostId)
                               select new
                               {
                                   Id = postView.Id,
                                   PostId = post.Id,
                                   PostTitle = post.PostTitle,
                                   UserId = postView.ViewedByUserId,
                                   ViewCount = postView.ViewCount,
                               };
            StoreAllFields(FieldStorage.Yes);
        }


        public class Result
        {
            public string Id { get; set; }
            public string PostId { get; set; }
            public string PostTitle { get; set; }
            public string UserId { get; set; }
            public int ViewCount { get; set; }
        }
    }

    public class Post
    {
        public string Id { get; set; }
        public string PostTitle { get; set; }
        public string AuthorId { get; set; }
    }

    public class PostView
    {
        public string Id { get; set; }
        public string ViewedByUserId { get; set; }
        public string PostId { get; set; }
        public int ViewCount { get; set; }
    }
}