我正在尝试更新基表中的列,这是物化视图中的分区键,并试图了解它在生产环境中的性能影响。
基准表:
CREATE TABLE if not exists data.test
(
foreignid uuid,
id uuid,
kind text,
version text,
createdon timestamp,
**certid** text,
PRIMARY KEY(foreignid,createdon,id)
);
物化视图:
CREATE MATERIALIZED VIEW if not exists data.test_by_certid
AS
SELECT *
FROM data.test
WHERE id IS NOT NULL AND foreignid
IS NOT NULL AND createdon IS NOT NULL AND certid IS NOT NULL
PRIMARY KEY (**certid**, foreignid, createdon, id);
因此,certid是物化视图中的新分区键
发生了什么:
1. When we first insert into the test table , usually the certids would
be empty which would be replaced by "none" string and inserted into
the test base table.
2.The row gets inserted into materialized view as well
3. When the user provides us with certid , the row gets updated in the test base table with the new certid
4.the action gets mirrored and the row is updated in materialized view wherein the partition key certid is getting updated from "none"
to a new value
问题:
1.What is the perfomance implication of updating the partition key certid in the materialized view?
2.For my use case, is it better to create a new table with certid as partition key (insert only when certid in non-empty) and manually
maintain all CRUD operations to the new table or should I use MV and
let cassandra do the bookkeeping?
需要注意的是,性能是一个重要的标准,因为它将在生产环境中使用。
由于
答案 0 :(得分:7)
更新一个存在一个或多个视图的表总是比更新没有视图的表更昂贵,因为执行read-before-write和锁定分区的开销确保并发更新与读取一致-Before写。您可以在ScyllaDb's wiki中阅读有关Cassandra中物化视图内部的更多信息。
如果更改certid
是一次性操作,那么性能影响不应该太担心。无论如何,让Cassandra处理更新MV总是更好的想法,因为它会处理异常(例如当存储视图的节点被分区并且更新无法传播时会发生什么),并最终确保一致性
如果您担心表现,请考虑用Scylla替换Cassandra。