避免Cassandra

时间:2016-03-31 05:35:19

标签: php cassandra cql

我正在Cassandra(2.2.3)项目中工作,我必须在其中存储评论,并且可以获得所有附加评论的最小值,最大值,计数和平均值的参考元素。为了做到这一点,当我插入一个新的评论时,我必须删除并重新插入相应的记录,以便更新集群密钥,但是为了存储这些密钥,我使用另一个表,如索引。问题是在所有这些表的更新过程中我使用批处理,但如果在同一时间执行另一个更新过程,我可能在排序表中有重复的条目或在密钥存储索引表中有无效的值。

如何在没有并发写入风险的情况下执行批处理?

这是表结构:

CREATE TABLE IF NOT EXISTS reviews (domain VARCHAR, scenario VARCHAR, refer VARCHAR, type VARCHAR, id VARCHAR, value FLOAT, comment VARCHAR, author VARCHAR, title VARCHAR, date TIMESTAMP, attributes MAP<VARCHAR, VARCHAR>, answer VARCHAR, answer_author VARCHAR, answer_title VARCHAR, answer_date TIMESTAMP, answer_attributes MAP<VARCHAR, VARCHAR>, PRIMARY KEY((domain, scenario, refer, type), id)) WITH CLUSTERING ORDER BY (id DESC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_avg (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value DESC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_min (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value ASC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_max (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value DESC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_count (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value INT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value ASC);
CREATE TABLE IF NOT EXISTS reviews_ext_index (domain VARCHAR, refer VARCHAR, scenario VARCHAR, count INT, avg FLOAT, min FLOAT, max FLOAT, sum FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), refer)) WITH CLUSTERING ORDER BY (refer ASC);

这里是CQL(而不是PHP)中的事务示例

BEGIN BATCH
DELETE FROM acme_reviews_ext_ordering_avg WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
DELETE FROM acme_reviews_ext_ordering_min WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
DELETE FROM acme_reviews_ext_ordering_max WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
DELETE FROM acme_reviews_ext_ordering_count WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
INSERT INTO acme_reviews_ext_ordering_avg (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
INSERT INTO acme_reviews_ext_ordering_min (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
INSERT INTO acme_reviews_ext_ordering_max (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
INSERT INTO acme_reviews_ext_ordering_count (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
UPDATE acme_reviews_ext_index SET min = [MIN], avg = [AVG], max = [MAX], count = [COUNT], sum = [SUM] WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND refer = '[REFER]';
APPLY BATCH;

这是一个实际的例子(同样在CQL中):A和B是同时插入评论的两个客户端,在这种情况下最小化我只更新平均值:A插入一个值4所以过去的平均值从3变为3.5(这只是一个例子),B插入值为4.5,平均值变为3.7而不是过去的值3,这里是两个批处理语句:

这里A:

BEGIN BATCH
DELETE FROM acme_reviews_ext_ordering_avg WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND value = 3 AND refer = 'post-id-value';
INSERT INTO acme_reviews_ext_ordering_avg (domain, scenario, type, value, refer) VALUES ('foo.bar', 'article', 'generic', 3.5, 'refer-id-value');
UPDATE acme_reviews_ext_index SET avg = 3.5 WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND refer = 'post-id-value';
APPLY BATCH;

这里B:

BEGIN BATCH
DELETE FROM acme_reviews_ext_ordering_avg WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND value = 3 AND refer = 'post-id-value';
INSERT INTO acme_reviews_ext_ordering_avg (domain, scenario, type, value, refer) VALUES ('foo.bar', 'article', 'generic', 3.7, 'refer-id-value');
UPDATE acme_reviews_ext_index SET avg = 3.7 WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND refer = 'post-id-value';
APPLY BATCH;

在并发写入A的常见情况下,删除行和B不是因为该行已被A的批处理删除但是两者都插入了导致重复的新行,在索引表中我只有一个键值,A或B,因此副本的键值之一未被编入索引。

我认为也可能发生这样的情况:当A和B批次完成时,我在排序表中只有一条记录,所以正确,但索引表中的值是错误的。

1 个答案:

答案 0 :(得分:0)

由于Cassandra没有提供任何事务隔离,我不知道如何在C *级别上解决这个问题。您需要同步客户端以确保只有一个客户端可以独占访问需要删除插入的表。

用例可能还会导致大量逻辑删除问题,具体取决于您在正常操作中删除的次数。

鉴于这些问题,您可能最好将按值类型X搜索帖子外部化为Solr或ElasticSearch等外部索引。或者,如果您可以升级到Cassandra 3.x,那么您应该能够使用新引入的materialized views来解决您的问题。检查此thread是否存在类似问题以及描述具体化视图的解决方案。