Question

我正在尝试在DynamodDB中对目录系统进行建模。它具有包含“集合”的“目录”。每个“集合”可以由许多“标签”标记。

在RDBMS中，我将创建一个与“ Collections”具有1：n关系的表“ Catalogs”。 “集合”的n：n为“标签”，因为一个集合可以有多个标签，一个标签可以属于多个集合。

我要运行的查询是：

1）获取所有目录

2）通过ID获取目录

3）按目录ID获取收藏集

我在AWS上阅读过，可以使用邻接列表设计（因为我的n：n带有“标签”）。所以这是我的表结构：

PK         SK         name    
cat-1      cat-1      Sales Catalog
cat-1      col-1      Sales First Collection
cat-1      col-2      Sales Second Collection
cat-2      cat-2      Finance Catalog 
tag-1      tag-1      Recently Added Tag
col-1      tag-1      (collection, tag relationship)

这里的问题是我必须使用我认为效率低下的扫描才能获取所有“目录”，因为查询的PK必须为“ =”而不是“开头为”。

我唯一想到的是创建另一个属性，例如“ GSI_PK”，并在PK为cat-1和SK为cat-1时添加“ Catalog_1”，在PK为cat-2和SK时添加“ Catalog_2”是猫2。我从来没有真正看到过这样做，所以我不确定这是否可行，如果我想更改ID，则需要一些维护。

有什么想法我将如何实现？

Answer 1

在这种情况下，您可以将PK作为对象的类型并将SK作为uuid。一条记录看起来像这样{ PK: "Catalog", SK: "uuid", ...other catalog fields }。然后，您可以通过对PK = Catalog进行查询来获取所有目录。

要存储关联，您可以在两个字段sourcePK和relatedPK上具有GSI，您可以在其中存储与事物相关联的记录。要关联对象，您可以创建一条记录，例如{ PK: "Association", SK: "uuid", sourcePK: "category-1", relatedPK: "collection-1", ... other data on the association }。要查找与ID为1的“目录”关联的对象，可以在GSI上进行查询，其中sourcePK = catalog-1。

使用此设置时，您需要注意热键，并应确保在表或索引中的同一分区键下，您的数据不会超过10GB。

Answer 2

我们来看看它。我将使用GraphQL SDL来布局数据模型和查询的设计，但是您可以直接将相同的概念应用于DynamoDB。

首先考虑数据模型，我们将得到类似的东西：

type Catalog {
  id: ID!
  name: String

  # Use a DynamoDB query on the **Collection** table 
  # where the **catalogId = $ctx.source.id**. Use a GSI or make catalogId the PK.
  collections: [Collection]
}
type Collection {
  id: ID!
  name: String

  # Use a DynamoDB query on the **CollectionTag** table where
  # the **collectionId = $ctx.source.id**. Use a GSI or make the collectionId the PK.
  tags: [CollectionTag]
}
# The "association map" idea as a GraphQL type. The underlying table has a collectionId and tagId.
# Create objects of this type to associate a collection and tag in the many to many relationship.
type CollectionTag {
  # Do a GetItem on the **Collection** table where **id = $ctx.source.collectionId**
  collection: Collection

  # Do a GetItem on the **Tag** table where **id = $ctx.source.tagId**
  tag: Tag
}
type Tag {
  id: ID!
  name: String

  # Use a DynamoDB query on teh **CollectionTag** table where
  # the **tagId = $ctx.source.id**. If collectionId is the PK then make a GSI where this tagId is the PK.
  collections: [CollectionTag]
}

# Root level queries
type Query {
  # GetItem to **Catalog** table where **id = $ctx.args.id**
  getCatalog(id: ID!): Catalog

  # Scan to **Catalog** table. As long as you don't care about ordering on a filed in particular then
  # this will likely be okay at the top level. If you only want all catalogs where "arePublished = 1",
  # for example then we would likely change this.
  allCatalogs: [Catalog]

  # Note: You don't really need a getCollectionsByCatalogId(catalogId: ID!) at the top level because you can
  # use `query { getCatalog(id: "***") { collections { ... } } }` which is effectively the same thing.
  # You could add another field here if having it at the top level was a requirement
  getCollectionsByCatalogId(catalogId: ID!): [Collection]
}

注意：在上面我使用[Collection]或[Catalog]等的任何地方，都应使用CollectionConnection，CatalogConnection等包装类型来启用分页。

如何在DynamoDB中查询n：n邻接列表映射而不使用扫描

2 个答案: