我想知道Cassandra何时可以在行键上指定唯一约束。与SQL Server的ADD CONSTRAINT myConstrain UNIQUE (ROW_PK)
如果插入已存在的行键,则现有数据不会被覆盖,但是我收到一些异常或响应,由于约束违规而无法执行更新。
也许这个问题有一个解决方法 - 有一些计数器可以将接缝更新为原子。
答案 0 :(得分:12)
轻量级交易?
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
INSERT INTO customer_account (customerID, customer_email)
VALUES (‘LauraS’, ‘lauras@gmail.com’)
IF NOT EXISTS;
答案 1 :(得分:9)
不幸的是,不,因为Cassandra不会对写入执行任何检查。为了实现类似的功能,Cassandra必须在每次写入之前进行读取,以检查是否允许写入。这会大大减慢写入速度。 (重点是写入顺序流出而不需要进行任何磁盘搜索 - 读取中断此模式并强制寻求发生。)
我想不出计数器会有所帮助的方式。计数器不是使用原子测试和设置实现的。相反,它们基本上存储了许多增量,当您读取计数器值时,这些增量会相加。
答案 2 :(得分:5)
我今天感觉很好,并且我不会对所有其他海报进行投票,因为它说甚至不可能只用Cassandra集群创建一个锁。我刚刚实施了Lamport的面包店算法¹,它运行得很好。不需要任何其他奇怪的东西,如动物园,笼子,记忆表等。
相反,只要您可以获得至少具有QUORUM一致性的读写,您就可以实现穷人的多进程/多计算机锁定机制。这就是你真正需要能够正确实现这个算法。 (QUORUM级别可以根据您需要的锁类型而变化:本地,机架,完整网络。)
我的实现将出现在libQtCassandra的0.4.7版本中(在C ++中)。我已经测试过它完全锁定了。还有一些我想要测试的东西,让你定义一组现在硬编码的参数。但这种机制运作良好。
当我发现这个帖子时,我觉得有些不对劲。我搜索了一些,并在下面提到的Apache上找到了一个页面。该页面不是很先进,但他们的MoinMoin没有提供讨论页面...无论如何,我认为值得一提。希望人们将开始以各种语言(如PHP,Ruby,Java等)实现该锁定机制,以便使用它并知道它的工作原理。
来源:http://wiki.apache.org/cassandra/Locking
¹http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
以下内容或多或少是我实现版本的方式。这只是一个简化的概要。我可能需要更新它,因为我在测试生成的代码时做了一些增强(实际代码也使用RAII并在TTL之上包含超时功能。)最终版本将在libQtCassandra library中找到
// lock "object_name"
void lock(QString object_name)
{
QString locks = context->lockTableName();
QString hosts_key = context->lockHostsKey();
QString host_name = context->lockHostName();
int host = table[locks][hosts_key][host_name];
pid_t pid = getpid();
// get the next available ticket
table[locks]["entering::" + object_name][host + "/" + pid] = true;
int my_ticket(0);
QCassandraCells tickets(table[locks]["tickets::" + object_name]);
foreach(tickets as t)
{
// we assume that t.name is the column name
// and t.value is its value
if(t.value > my_ticket)
{
my_ticket = t.value;
}
}
++my_ticket; // add 1, since we want the next ticket
table[locks]["tickets::" + object_name][my_ticket + "/" + host + "/" + pid] = 1;
// not entering anymore, by deleting the cell we also release the row
// once all the processes are done with that object_name
table[locks]["entering::" + object_name].dropCell(host + "/" + pid);
// here we wait on all the other processes still entering at this
// point; if entering more or less at the same time we cannot
// guarantee that their ticket number will be larger, it may instead
// be equal; however, anyone entering later will always have a larger
// ticket number so we won't have to wait for them they will have to wait
// on us instead; note that we load the list of "entering" once;
// then we just check whether the column still exists; it is enough
QCassandraCells entering(table[locks]["entering::" + object_name]);
foreach(entering as e)
{
while(table[locks]["entering::" + object_name].exists(e))
{
sleep();
}
}
// now check whether any other process was there before us, if
// so sleep a bit and try again; in our case we only need to check
// for the processes registered for that one lock and not all the
// processes (which could be 1 million on a large system!);
// like with the entering vector we really only need to read the
// list of tickets once and then check when they get deleted
// (unfortunately we can only do a poll on this one too...);
// we exit the foreach() loop once our ticket is proved to be the
// smallest or no more tickets needs to be checked; when ticket
// numbers are equal, then we use our host numbers, the smaller
// is picked; when host numbers are equal (two processes on the
// same host fighting for the lock), then we use the processes
// pid since these are unique on a system, again the smallest wins.
tickets = table[locks]["tickets::" + object_name];
foreach(tickets as t)
{
// do we have a smaller ticket?
// note: the t.host and t.pid come from the column key
if(t.value > my_ticket
|| (t.value == my_ticket && t.host > host)
|| (t.value == my_ticket && t.host == host && t.pid >= pid))
{
// do not wait on larger tickets, just ignore them
continue;
}
// not smaller, wait for the ticket to go away
while(table[locks]["tickets::" + object_name].exists(t.name))
{
sleep();
}
// that ticket was released, we may have priority now
// check the next ticket
}
}
// unlock "object_name"
void unlock(QString object_name)
{
// release our ticket
QString locks = context->lockTableName();
QString hosts_key = context->lockHostsKey();
QString host_name = context->lockHostName();
int host = table[locks][hosts_key][host_name];
pid_t pid = getpid();
table[locks]["tickets::" + object_name].dropCell(host + "/" + pid);
}
// sample process using the lock/unlock
void SomeProcess(QString object_name)
{
while(true)
{
[...]
// non-critical section...
lock(object_name);
// The critical section code goes here...
unlock(object_name);
// non-critical section...
[...]
}
}
重要提示(2019/05/05):虽然使用Cassandra实现Lamport's Bakery是一项很好的练习,但它是Cassandra数据库的反模式。这意味着它可能在重负载下表现不佳。从那时起我创建了一个新的lock system,仍然使用Lamport的算法,但是将所有数据保存在内存中(它非常小)并且仍然允许多台计算机参与锁定,因此如果一个发生故障,锁定系统继续按预期工作(许多其他锁定系统没有这种功能。当主机发生故障时,您将失去锁定功能,直到另一台计算机决定自己成为新主机......)
答案 3 :(得分:3)
扩大百万次写作&耐久性
如果我们考虑你的情况。在做这个cassandra之前需要
在案例2中,cassandra已经实现了bloom过滤器,这将是一个开销。每一次写作都将是一个阅读和阅读。写
但是你的请求可以减少cassandra中的合并开销,因为在任何时候密钥只会出现在一个sstable中。但是cassandra的建筑将不得不改变它。
Jus查看此视频http://blip.tv/datastax/counters-in-cassandra-5497678或下载此演示文稿http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf,了解计数器如何进入cassandra的存在。
答案 4 :(得分:2)
一种可能性是使用Cages和ZooKeeper:
http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages