非规范化与儿童/父母和嵌套

时间:2017-07-18 18:26:30

标签: mysql elasticsearch filtering normalization nosql

我们正在为事件发生的事件,日程安排和场地设计弹性搜索模型。 设计如下:

model 我们可能需要的查询示例:

  

在2017年1月7日至7月7日期间查找音乐会活动

     

寻找在伦敦演出的艺术家,该活动是剧院游戏

     

查找事件,即电影和得分> 70%

     

查找参加活动AwesomeEvent

的用户      

寻找场地,哪个地点是伦敦,并且从今天开始计划将来的任何活动

我已经阅读了elastic doc和一些文章,例如this和一些堆栈questions。但我仍然不确定我们的模型,因为它非常具体。

可能的用法示例:

1)使用嵌套模式

{
  "title": "Event",
  "body":  "This great event is going to be...",
  "Schedules": [ 
    {
      "name":    "Schedule 1",
      "start":   "7.7.2017",
      "end":     "8.7.2017"
    },
    {
      "name":    "Schedule 2",
      "start":   "10.7.2017",
      "end":   "11.7.2017"
    }
  ],
  "Performers": [ 
    {
      "name":    "Performer 1",
      "genre":   "Rock"
    },
    {
      "name":    "Performer 2",
      "genre":   "Pop"
    }
  ],
  ...
}

优点:

  1. 更平坦的模特应该坚持"关键:价值"方法
  2. 实体自身携带所有信息
  3. 缺点:

    1. 大量冗余数据
    2. 更复杂的实体
    3. 2)以下实体之间的父/子关系(简化)

      {
        "title": "Event",
        "body":  "This great event is going to be...",
      }
      
      {
        "title": "Schedule",
        "start":   "7.7.2017",
        "end":     "8.7.2017"
      }
      
      {
        "name":    "Performer",
        "genre":   "Rock"
      }
      

      优点:

      1. 避免复制冗余数据
      2. 缺点:

        1. 更多联接(即使父/子存储在同一shard
        2. 模型不是那么平坦,我不确定性能
        3. 到目前为止,我们有一个关系数据库,模型工作正常,但速度不够快。特别是例如当你想象一个电影院时,一个事件(电影)可以在不同的地方有成千上万的时间表,我们想要在第一部分写作时实现非常快速的过滤响应。

          我希望有任何建议能够正确设计数据模型。我也很高兴审查我的假设(可能其中一些可能是错误的)。

1 个答案:

答案 0 :(得分:1)

It's hard to denormalize your data. For example, the number of performers in an event is unknown; so if you were to have specific fields for performers, you would need perofrmer1.firstname, perofrmer1.lastname, performer2.firstname, performer2.lastname, etc. However if you use nested field instead, you would simply define a nested field Performer under event index with correct sub-field mappings, then you can add as many as you want to it. This will enable you to lookup event by performer or performer by event. The same apply to the rest of the indices.

As far as parent-child vs nested, parent-child provide more dependence as child documents reside on a completely separate index. Both parent-child and nested fields can specify "include_in_parent" option to automatically denormalize the fields for you