I'm looking to keep running counts of some items such as likes and comments on a post. The write rate can be high, e.g. 1K likes/sec.
Using a SELECT COUNT
does not seem feasible even if the result set is indexed as there could be a few million rows to count.
I'm thinking of using a sharded counters approach where a specific counter (likes for a given post) consists of N
shards/rows. Incrementing the counter would increment the column value of one shard's row, while reading the counter would read all shard rows and sum the count values. Would there be any issues in such an approach with Spanner?
I understand that in Bigtable, multiple updates to the same row will create new versions of cells in the row and as a result, you can cause a row to exceed its size limit. So using rows as sharded counters in Bigtable seems to be a bad idea. Does Spanner have any similar issues?
答案 0 :(得分:2)
共享计数器以提高并行度似乎是一个好主意。 Cloud Spanner以不同于BigTable的方式管理较旧版本的数据,因此您可能不会遇到相同的限制。 Spanner会将旧版本保留1小时左右。但是,您可能需要在将架构设计为avoid hotspots时要小心。
尽管我还是建议您尝试在Spanner之上实现内存缓存层。这可以用来:
如果缓存消失了,可能会失去一些更新,这是一个折衷方案,但是如果只是缓存点赞/计数,这可能是可以接受的。
答案 1 :(得分:2)
我了解在Bigtable中,对同一行的多次更新 在该行中创建新版本的单元格,结果,您可能会导致 行超过其大小限制。因此使用行作为分片计数器 Bigtable似乎是个坏主意。 Spanner是否有任何类似的问题?
如注释中所述,您可以使用ReadModifyWrite Increment API,但需要注意的是,Bigtable中的行事务操作(如ReadModifyWrite)较慢。
但是,使用多行代表一个计数器,然后使用前缀扫描一起读取这些行应该没问题。
关键在于https://fnb-japan.info,以便在群集中的各个节点之间分配写操作,并避免出现热点。