Question

我们在谷歌应用引擎上运行了一个Java应用程序。有一种叫做联系。以下是示例模式

Contact
{
  long id
  String firstName
  String lastName
  ...
}

以上是existig模型，用于支持我们在数据存储和文本搜索中存储此对象的一些要求

现在我们想要将联系人与他们的页面浏览数据集成。

每个联系人可以拥有数千个页面浏览记录，甚至可以拥有数百万个联系人

以下是示例页面访问对象[注意：我们目前没有此对象，这只是提供有关页面访问的信息]

PageVisit
{

  long id
  String url
  String refUrl
  int  country
  String city
  ....
}

我们有一个要求，需要查询联系核心属性和他的页面访问数据

代表：

select * from Contact where firstName = 'abc' and url = 'cccccc.com';
select * from Contact where firstName = 'abc' or url = 'cccccc.com';

要编写此类查询，我们需要联系核心属性，并且访问的页面需要在Contact对象本身中提供，但需要联系可以有大量的页面浏览量。因此，这将跨越实体最大大小限制

那么如何在数据存储和文本搜索中设计这种情况下的联系模型。

由于

Answer 1

Cloud Datastore不支持联接，因此您需要以某种方式从客户端代码处理此问题。

处理此问题的两种可能方法是：

将您需要搜索的联系人归一化为PageVisit：

PageVisit
{

  long id
  String firstName // Denormalized from Contact
  String url
  String refUrl
  int  country
  String city
  ....
}

这需要您创建一个复合索引：

- kind: PageVisit
  ancestor: no
  properties:
  - name: firstName
  - name: url

或运行多个查询

select id from Contact where firstName = 'abc'

select * from PageVisit where contactId={id} and url = 'cccccc.com';
select * from PageVisit where contactId={id} or url = 'cccccc.com';

这需要您创建一个复合索引：

- kind: PageVisit
  ancestor: no
  properties:
  - name: contactId
  - name: url

最后一步：根据您网站的大小，可能需要查看Cloud Bigtable的PageView数据。对于高写入OLAP样式的工作负载，它是更好的解决方案。

设计应用引擎数据存储和文本搜索建模的最佳方式

1 个答案: