Cassandra - 行键的唯一约束

时间:2011-11-16 15:39:50

标签: cassandra

我想知道Cassandra何时可以在行键上指定唯一约束。与SQL Server的ADD CONSTRAINT myConstrain UNIQUE (ROW_PK)

类似的东西

如果插入已存在的行键,则现有数据不会被覆盖,但是我收到一些异常或响应,由于约束违规而无法执行更新。

也许这个问题有一个解决方法 - 有一些计数器可以将接缝更新为原子。

5 个答案:

答案 0 :(得分:12)

轻量级交易?

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

INSERT INTO customer_account (customerID, customer_email) VALUES (‘LauraS’, ‘lauras@gmail.com’) IF NOT EXISTS;

答案 1 :(得分:9)

不幸的是,不,因为Cassandra不会对写入执行任何检查。为了实现类似的功能,Cassandra必须在每次写入之前进行读取,以检查是否允许写入。这会大大减慢写入速度。 (重点是写入顺序流出而不需要进行任何磁盘搜索 - 读取中断此模式并强制寻求发生。)

我想不出计数器会有所帮助的方式。计数器不是使用原子测试和设置实现的。相反,它们基本上存储了许多增量,当您读取计数器值时,这些增量会相加。

答案 2 :(得分:5)

我今天感觉很好,并且我不会对所有其他海报进行投票,因为它说甚至不可能只用Cassandra集群创建一个锁。我刚刚实施了Lamport的面包店算法¹,它运行得很好。不需要任何其他奇怪的东西,如动物园,笼子,记忆表等。

相反,只要您可以获得至少具有QUORUM一致性的读写,您就可以实现穷人的多进程/多计算机锁定机制。这就是你真正需要能够正确实现这个算法。 (QUORUM级别可以根据您需要的锁类型而变化:本地,机架,完整网络。)

我的实现将出现在libQtCassandra的0.4.7版本中(在C ++中)。我已经测试过它完全锁定了。还有一些我想要测试的东西,让你定义一组现在硬编码的参数。但这种机制运作良好。

当我发现这个帖子时,我觉得有些不对劲。我搜索了一些,并在下面提到的Apache上找到了一个页面。该页面不是很先进,但他们的MoinMoin没有提供讨论页面...无论如何,我认为值得一提。希望人们将开始以各种语言(如PHP,Ruby,Java等)实现该锁定机制,以便使用它并知道它的工作原理。

来源:http://wiki.apache.org/cassandra/Locking

¹http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm

以下内容或多或少是我实现版本的方式。这只是一个简化的概要。我可能需要更新它,因为我在测试生成的代码时做了一些增强(实际代码也使用RAII并在TTL之上包含超时功能。)最终版本将在libQtCassandra library中找到

// lock "object_name"
void lock(QString object_name)
{
    QString locks = context->lockTableName();
    QString hosts_key = context->lockHostsKey();
    QString host_name = context->lockHostName();
    int host = table[locks][hosts_key][host_name];
    pid_t pid = getpid();

    // get the next available ticket
    table[locks]["entering::" + object_name][host + "/" + pid] = true;
    int my_ticket(0);
    QCassandraCells tickets(table[locks]["tickets::" + object_name]);
    foreach(tickets as t)
    {
        // we assume that t.name is the column name
        // and t.value is its value
        if(t.value > my_ticket)
        {
            my_ticket = t.value;
        }
    }
    ++my_ticket; // add 1, since we want the next ticket
    table[locks]["tickets::" + object_name][my_ticket + "/" + host + "/" + pid] = 1;
    // not entering anymore, by deleting the cell we also release the row
    // once all the processes are done with that object_name
    table[locks]["entering::" + object_name].dropCell(host + "/" + pid);

    // here we wait on all the other processes still entering at this
    // point; if entering more or less at the same time we cannot
    // guarantee that their ticket number will be larger, it may instead
    // be equal; however, anyone entering later will always have a larger
    // ticket number so we won't have to wait for them they will have to wait
    // on us instead; note that we load the list of "entering" once;
    // then we just check whether the column still exists; it is enough
    QCassandraCells entering(table[locks]["entering::" + object_name]);
    foreach(entering as e)
    {
        while(table[locks]["entering::" + object_name].exists(e))
        {
            sleep();
        }
    }

    // now check whether any other process was there before us, if
    // so sleep a bit and try again; in our case we only need to check
    // for the processes registered for that one lock and not all the
    // processes (which could be 1 million on a large system!);
    // like with the entering vector we really only need to read the
    // list of tickets once and then check when they get deleted
    // (unfortunately we can only do a poll on this one too...);
    // we exit the foreach() loop once our ticket is proved to be the
    // smallest or no more tickets needs to be checked; when ticket
    // numbers are equal, then we use our host numbers, the smaller
    // is picked; when host numbers are equal (two processes on the
    // same host fighting for the lock), then we use the processes
    // pid since these are unique on a system, again the smallest wins.
    tickets = table[locks]["tickets::" + object_name];
    foreach(tickets as t)
    {
        // do we have a smaller ticket?
        // note: the t.host and t.pid come from the column key
        if(t.value > my_ticket
        || (t.value == my_ticket && t.host > host)
        || (t.value == my_ticket && t.host == host && t.pid >= pid))
        {
            // do not wait on larger tickets, just ignore them
            continue;
        }
        // not smaller, wait for the ticket to go away
        while(table[locks]["tickets::" + object_name].exists(t.name))
        {
            sleep();
        }
        // that ticket was released, we may have priority now
        // check the next ticket
    }
}

// unlock "object_name"
void unlock(QString object_name)
{
    // release our ticket
    QString locks = context->lockTableName();
    QString hosts_key = context->lockHostsKey();
    QString host_name = context->lockHostName();
    int host = table[locks][hosts_key][host_name];
    pid_t pid = getpid();
    table[locks]["tickets::" + object_name].dropCell(host + "/" + pid);
}

// sample process using the lock/unlock
void SomeProcess(QString object_name)
{
    while(true)
    {
        [...]
        // non-critical section...
        lock(object_name);
        // The critical section code goes here...
        unlock(object_name);
        // non-critical section...
        [...]
    }
}

重要提示(2019/05/05):虽然使用Cassandra实现Lamport's Bakery是一项很好的练习,但它是Cassandra数据库的反模式。这意味着它可能在重负载下表现不佳。从那时起我创建了一个新的lock system,仍然使用Lamport的算法,但是将所有数据保存在内存中(它非常小)并且仍然允许多台计算机参与锁定,因此如果一个发生故障,锁定系统继续按预期工作(许多其他锁定系统没有这种功能。当主机发生故障时,您将失去锁定功能,直到另一台计算机决定自己成为新主机......)

答案 3 :(得分:3)

显然你不能 在cassandra中,所有的写作都反映在

  1. 提交日志
  2. MemTable中
  3. 扩大百万次写作&耐久性

    如果我们考虑你的情况。在做这个cassandra之前需要

    1. 检查Memtable中是否存在
    2. 检查所有sstables是否存在[如果您的密钥是从Memtable中刷新的]
    3. 在案例2中,cassandra已经实现了bloom过滤器,这将是一个开销。每一次写作都将是一个阅读和阅读。写

      但是你的请求可以减少cassandra中的合并开销,因为在任何时候密钥只会出现在一个sstable中。但是cassandra的建筑将不得不改变它。

      Jus查看此视频http://blip.tv/datastax/counters-in-cassandra-5497678或下载此演示文稿http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf,了解计数器如何进入cassandra的存在。

答案 4 :(得分:2)