Question

我正在使用NDB API构建Google App Engine应用程序（Python 2.7）。我是python开发的新手并且感觉这是一个之前已经回答的问题，但是通过我的搜索努力，我无法找到类似于这个问题/解决方案的东西。我决定在这里提出我的问题。

我有一个Document模型类，我需要查询并获取最“最新”的文档。具体来说，我想获得具有不同文档名称的文档对象（实体）列表，其expiration日期（datetime.date对象）是最大值。

例如，按到期日期降序排列的文档查询，例如：

documents = Document.query().order(-Document.expiration).fetch()

返回：

[{"name": "DocumentC", "expiration": datetime.date(2015, 3, 1)},
 {"name": "DocumentA", "expiration": datetime.date(2014, 4, 1)},
 {"name": "DocumentB", "expiration": datetime.date(2014, 2, 15)},
 {"name": "DocumentA", "expiration": datetime.date(2014, 1, 1)}]

基于这些查询结果，我想删除第二个（较旧的）“DocumentA”，并得到如下内容：

[{"name": "DocumentC", "expiration": datetime.date(2015, 3, 1)},
 {"name": "DocumentA", "expiration": datetime.date(2014, 4, 1)},
 {"name": "DocumentB", "expiration": datetime.date(2014, 2, 15)}]

我的解决方案是：

def current_docs(docs):
    output = []
    for d in docs:
        if not any(o['name'] == d['name'] for o in output):
            output.append(d)
    return output

cd = current_docs(documents)
# returns:
# [{'expiration': datetime.date(2015, 3, 1), 'name': 'DocumentC'},
# {'expiration': datetime.date(2014, 4, 1), 'name': 'DocumentA'},
# {'expiration': datetime.date(2014, 2, 15), 'name': 'DocumentB'}]

这似乎给了我预期的结果，但是：

有没有更好的方法来过滤原始查询以从头开始获得我想要的结果？
如果没有，是否有比我的解决方案更好，更有效的方法？

Answer 1

我在第二个问题上的方法：

def current_docs(docs):
  tmp = {}
  output = []
  for d in docs:
    if d['name'] in tmp:
      continue
    tmp[d['name']] = 1
    output.append(d)
  return output

保留已添加名称的字典，并仅添加尚未添加的字典。不过，对Google App Engine一无所知：）

Answer 2

如果您的数据符合文档中指出的限制，您应该能够使用投影查询group_by=["name"]和distinct=True来完成此操作。

文档：https://developers.google.com/appengine/docs/python/ndb/queries#projection

或者，我建议将数据保存到预先计算的表中，该表只包含唯一的文档名称和最新的数据/状态。您在写入时会产生额外的成本，但是您可以快速读取并且不必依赖于未经过滤的数据集，这些数据集适合实例内存，如果您打算在运行时进行过滤，则需要这样做。

按多个对象属性过滤对象列表

2 个答案: