Question

我有一个包含大致形式的文档的数据库：

{"created_at": some_datetime, "deleted_at": another_datetime, "foo": "bar"}

假设我们以后不需要处理“deleted_at”，在数据库中获取未删除文档的数量是微不足道的。创建一个缩小为以下内容的视图（使用UTC）也是微不足道的：

[
  {"key": ["created", 2012, 7, 30], "value": 39},
  {"key": ["deleted", 2012, 7, 31], "value": 12}
  {"key": ["created", 2012, 8, 2], "value": 6}
]

...这意味着在2012-07-30将39个文件标记为创建，12个在2012-07-31标记为已删除，依此类推。我想要的是一个有效的机制，可以在2012-08-01（0 + 39-12 == 27）获取“存在”多少文档的快照。理想情况下，我希望能够以日期作为键或索引查询视图或数据库（例如已经预先计算并保存到磁盘的内容），并将计数作为值或文档。 e.g：

[
  {"key": [2012, 7, 30], "value": 39},
  {"key": [2012, 7, 31], "value": 27},
  {"key": [2012, 8,  1], "value": 27},
  {"key": [2012, 8,  2], "value": 33}
]

通过迭代视图中的所有行，保持运行计数器并在每天进行总结，可以很容易地计算出这一点，但随着数据集变大，这种方法会变慢，除非我很聪明关于缓存或存储结果。有没有更聪明的方法来解决这个问题？

Answer 1

仅仅为了比较（我希望有人有更好的解决方案），这里（或多或少）我当前正在解决它（在未经测试的ruby伪代码中）：

require 'date'

def date_snapshots(rows)
  current_date  = nil
  current_count = 0
  rows.inject({}) {|hash, reduced_row|
    type, *ymd = reduced_row["key"]
    this_date  = Date.new(*ymd)
    if current_date
      # deal with the days where nothing changed
      (current_date.succ ... this_date).each do |date|
        key       = date.strftime("%Y-%m-%d")
        hash[key] = current_count
      end
    end
    # update the counter and deal with the current day
    current_date   = this_date
    current_count += reduced_row["value"] if type == "created_at"
    current_count -= reduced_row["value"] if type == "deleted_at"
    key       = current_date.strftime("%Y-%m-%d")
    hash[key] = current_count
    hash
  }
end

然后可以这样使用：

rows = couch_server.db(foo).design(bar).view(baz).reduce.group_level(3).rows
date_snapshots(rows)["2012-08-01"]

明显的小改进是添加一个缓存层，尽管让缓存层很好地进行增量更新（例如更改提要）并不是那么简单。

Answer 2

我发现一种方法似乎比我原来的方法要好得多，假设你只关心一个日期：

def size_at(date=Time.now.to_date)
  ymd = [date.year, date.month, date.day]
  added = view.reduce.
    startkey(["created_at"]).
    endkey(  ["created_at", *ymd, {}]).rows.first || {}
  deleted = view.reduce.
    startkey(["deleted_at"]).
    endkey(  ["deleted_at", *ymd, {}]).rows.first || {}
  added.fetch("value", 0) - deleted.fetch("value", 0)
end

基本上，让CouchDB为您做减少。我最初没有意识到你可以使用startkey / endkey混合和匹配reduce。

不幸的是，这种方法需要对数据库进行两次点击（尽管这些可以并行化或流水线化）。当你想要同时获得大量这些尺寸时（例如查看整个历史记录，而不仅仅是查看一个日期），它不会起作用。

CouchDB历史视图快照

2 个答案: