Question

我试图在Rust中编写union-find的实现。这在C语言中实现非常简单，同时仍然需要进行复杂的运行时分析。

我无法获取Rust的互斥锁语义以允许迭代手动锁定。

这就是我如何到达现在的位置。

首先，这是我在C中想要的部分结构的一个非常简单的实现：

#include <stdlib.h>

struct node {
  struct node * parent;
};

struct node * create(struct node * parent) {
  struct node * ans = malloc(sizeof(struct node));
  ans->parent = parent;
  return ans;
}

struct node * find_root(struct node * x) {
  while (x->parent) {
    x = x->parent;
  }
  return x;
}

int main() {
  struct node * foo = create(NULL);
  struct node * bar = create(foo);
  struct node * baz = create(bar);
  baz->parent = find_root(bar);
}

请注意，指针的结构是倒置树;多个指针可能指向一个位置，并且没有循环。

此时，没有路径压缩。

这是Rust翻译。我选择使用Rust的引用计数指针类型来支持上面引用的倒置树类型。

请注意，此实现更加冗长，可能是由于Rust提供了更高的安全性，但可能是由于我对Rust没有经验。

use std::rc::Rc;

struct Node {
    parent: Option<Rc<Node>>
}

fn create(parent: Option<Rc<Node>>) -> Node {
    Node {parent: parent.clone()}
}

fn find_root(x: Rc<Node>) -> Rc<Node> {
    let mut ans = x.clone();
    while ans.parent.is_some() {
        ans = ans.parent.clone().unwrap();
    }
    ans
}

fn main() {
    let foo = Rc::new(create(None));
    let bar = Rc::new(create(Some(foo.clone())));
    let mut prebaz = create(Some(bar.clone()));
    prebaz.parent = Some(find_root(bar.clone()));
}

每次调用find_root时，路径压缩都会沿着到根目录的路径重新占用每个节点。要将此功能添加到C代码，只需要两个新的小功能：

void change_root(struct node * x, struct node * root) {
  while (x) {
    struct node * tmp = x->parent;
    x->parent = root;
    x = tmp;
  }
}

struct node * root(struct node * x) {
  struct node * ans = find_root(x);
  change_root(x, ans);
  return ans;
}

函数change_root执行所有重新生成父项，而函数root只是一个包装器，用于使用find_root的结果重新父级路径上的节点根

为了在Rust中执行此操作，我决定使用Mutex而不仅仅是引用计数指针，因为Rc接口只允许通过copy-on-write进行可变访问当多个指向该项目的指针存在时。因此，所有代码都必须改变。在进入路径压缩部分之前，我被find_root挂了：

use std::sync::{Mutex,Arc};

struct Node {
    parent: Option<Arc<Mutex<Node>>>
}

fn create(parent: Option<Arc<Mutex<Node>>>) -> Node {
    Node {parent: parent.clone()}
}

fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
    let mut ans = x.clone();
    let mut inner = ans.lock();
    while inner.parent.is_some() {
        ans = inner.parent.clone().unwrap();
        inner = ans.lock();
    }
    ans.clone()
}

这会产生错误（0.12.0）

error: cannot assign to `ans` because it is borrowed
ans = inner.parent.clone().unwrap();

note: borrow of `ans` occurs here
let mut inner = ans.lock();

我认为我需要的是手动锁定。对于路径A - >; B - ＆gt; C - ＆gt; ...，我需要锁定A，锁定B，解锁A，锁定C，解锁B，......当然，我可以保持所有锁都打开：锁定A，锁定B，锁定C，......解锁C，解锁B，解锁A，但这似乎效率低下。

但是，Mutex不提供解锁，而是使用RAII。 如何在Rust中实现手动锁定而无法直接呼叫unlock？

编辑：正如评论所述，我可以使用Rc<RefCell<Node>>而不是Arc<Mutex<Node>>。这样做会导致相同的编译器错误。

为了清楚说明我试图通过使用手动锁定来避免什么，这里有一个RefCell版本，它编译但在路径长度中使用了线性空间。

fn find_root(x: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
    let mut inner : RefMut<Node> = x.borrow_mut();
    if inner.parent.is_some() {
        find_root(inner.parent.clone().unwrap())
    } else {
        x.clone()
    }
}

Answer 1

我们可以非常轻松地完成手动锁定操作，因为我们只使用unsafe一点来遍历此列表，这对于向借用检查器说明我们所知道的一小部分洞察力是必要的，但它无法知道。

但首先，让我们清楚地阐述问题：

我们希望遍历一个链接列表，其节点存储为Arc<Mutex<Node>>以获取列表中的最后一个节点
我们需要锁定列表中的每个节点，以便另一个并发遍历必须紧跟在我们后面，并且不能破坏我们的进度。

在我们深入了解细节之前，让我们尝试为此函数编写签名：

fn find_root(node: Arc<Mutex<Node>>) -> Arc<Mutex<Node>>;

现在我们知道了我们的目标，我们可以开始实施 - 这是第一次尝试：

fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
    // We have to separate this from incoming since the lock must
    // be borrowed from incoming, not this local node.  
    let mut node = incoming.clone();
    let mut lock = incoming.lock();

    // Could use while let but that leads to borrowing issues.
    while lock.parent.is_some() {
       node = lock.parent.as_ref().unwrap().clone(); // !! uh-oh !!
       lock = node.lock();
    }

    node
}

如果我们尝试编译这个，那么在标记为!! uh-oh !!的行上rustc会出错，告诉我们lock仍然存在时我们无法移出节点，因为lock是借用node。这不是一个虚假的错误！ lock中的数据可能会在node发生后立即消失 - 这只是因为我们知道即使我们可以保持数据lock指向有效并位于相同的内存位置移动node，我们可以解决这个问题。

这里的关键见解是Arc中包含的数据的生命周期是动态的，借用检查器很难做出关于Arc内数据的详细信息的推断是有效的。

每次写锈都会发生这种情况;你比生锈更了解数据的生命周期和组织，你希望能够向编译器表达这些知识，有效地说“信任我”。输入：unsafe - 我们告诉编译器我们知道的不仅仅是它的方式，它应该允许我们告知它我们知道的保证，但事实并非如此。

在这种情况下，保证非常简单 - 我们将在锁定仍然存在的情况下替换节点，但我们不会确保即使节点消失，锁定内的数据仍然有效。为了表达这种保证，我们可以使用mem::transmute，这个函数允许我们重新解释任何变量的类型，只需使用它来改变节点返回的锁的生命周期，使其略长于实际值。 / p>

为了确保我们遵守承诺，我们将在我们重新分配锁定时使用另一个切换变量来保存节点 - 即使这会移动节点（更改其地址）并且借用检查器会对我们生气，我们知道它是好的，因为lock没有指向节点，它指向node内的数据，其地址（在这种情况下，因为它位于Arc后面）不会改变。

在我们找到解决方案之前，请务必注意我们在这里使用的技巧仅有效，因为我们使用的是Arc。借用检查器警告我们可能存在严重错误 - 如果Mutex是内联而不是Arc，则此错误将正确防止使用后免费，其中{保留在MutexGuard中的{1}}将尝试解锁已被删除的lock，或者至少移动到另一个内存位置。

Mutex

而且，就这样，rustc很高兴，而且我们手动锁定，因为最后一次锁定只有在我们获得新锁之后才会释放！

在这个实现中有一个未解决的问题，我还没有得到答案，即旧值的丢弃和新值的赋值是否保证是原子的 - 如果不是，那里是一种竞争条件，在use std::mem; use std::sync::{Arc, Mutex}; fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> { let mut node = incoming.clone(); let mut handoff_node; let mut lock = incoming.lock(); // Could use while let but that leads to borrowing issues. while lock.parent.is_some() { // Keep the data in node around by holding on to this `Arc`. handoff_node = node; node = lock.parent.as_ref().unwrap().clone(); // We are going to move out of node while this lock is still around, // but since we kept the data around it's ok. lock = unsafe { mem::transmute(node.lock()) }; } node }的赋值中获取新锁之前释放旧锁。通过另外一个lock变量并在重新分配之前将旧锁移入其中，然后在重新分配holdover_lock之后删除它来解决这个问题是非常简单的。

希望这完全解决了您的问题，并展示了当您真正了解更多信息时，lock如何用于解决借阅检查程序中的“缺陷”。我仍然希望你知道的不仅仅是借用检查器的情况很少，而且转换生命期并不是“通常的”行为。

正如您所看到的，以这种方式使用unsafe非常复杂，您必须处理许多，许多，可能的竞赛来源条件，我甚至可能没有抓到所有这些！除非你真的需要从许多线程访问这个结构，否则最好只使用Mutex和Rc，如果你需要它，因为这会让事情很多更容易。

Answer 2

在IRC上，Jonathan Reem指出inner正在借用它的词汇范围，这对我的要求来说太过分了。内联它会生成以下内容，编译时没有错误：

fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
    let mut ans = x.clone();
    while ans.lock().parent.is_some() {
        ans = ans.lock().parent.clone().unwrap();
    }
    ans
}

编辑：正如FrancisGagné指出的那样，这有一个竞争条件，因为锁定的延伸时间不够长。这是一个只有一个lock()电话的修改版本;也许它不容易受到同样的问题的影响。

fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
    let mut ans = x.clone();
    loop {
        ans = {
            let tmp = ans.lock();
            match tmp.parent.clone() {
               None => break,
               Some(z) => z
            }
        }
    }
    ans
}

编辑2 ：这一次只能保留一个锁，因此也是一般的。我仍然不知道如何进行手动锁定。

Answer 3

我认为这符合hand-over-hand locking的标准。

use std::sync::Mutex;

fn main() {
    // Create a set of mutexes to lock hand-over-hand
    let mutexes = Vec::from_fn(4, |_| Mutex::new(false));

    // Lock the first one
    let val_0 = mutexes[0].lock();
    if !*val_0 {
        // Lock the second one
        let mut val_1 = mutexes[1].lock();
        // Unlock the first one
        drop(val_0);
        // Do logic
        *val_1 = true;
    }

    for mutex in mutexes.iter() {
        println!("{}" , *mutex.lock());
    }
}

编辑＃1

当对锁定n + 1的访问受到锁n？
的保护时，它是否有效

如果你的意思可能形成如下形状，那么我认为答案是否定的。

struct Level {
  data: bool,
  child: Option<Mutex<Box<Level>>>,
}

但是，不应该是明智的。当您将对象包装在互斥锁中时，您会说“整个对象是安全的”。你不能同时说“整个馅饼是安全的”和“我正在吃地壳下面的东西”。也许您通过创建Mutex<()>并锁定它来放弃安全性？

Answer 4

这仍然不是你如何进行手动锁定的文字问题的答案，这只应该在并发设置中很重要（或者如果其他人强迫你使用Mutex对节点的引用）。相反，如何使用您似乎感兴趣的Rc和RefCell来执行此操作。

RefCell仅在保持一个可变引用时允许可变写入。重要的是，Rc<RefCell<Node>>对象不是可变引用。它所讨论的可变引用是在borrow_mut()对象上调用Rc<RefCell<Node>>的结果，只要你在有限的范围内（例如while循环的主体）那样做，你就会＆＃39 ;没关系。

路径压缩中发生的重要事情是next Rc对象将使node的父指针摆动到root时保持链的其余部分处于活动状态。但是，它不是Rust语义中的引用。

struct Node
{
    parent: Option<Rc<RefCell<Node>>>
}

fn find_root(mut node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>>
{
    while let Some(parent) = node.borrow().parent.clone()
    {
        node = parent;
    }

    return node;
}

fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>)
{
    while node.borrow().parent.is_some()
    {
        let next = node.borrow().parent.clone().unwrap();
        node.borrow_mut().parent = Some(root.clone());
        node = next;
    }
}

对于我使用的测试工具来说这很好，尽管可能还有bug。由于尝试panic!已经借用的东西，它肯定会在没有borrow_mut()的情况下进行编译和运行。它实际上可能产生正确的答案，这取决于你。

Answer 5

正如Frank Sherry和其他人所指出的，单线程时你不应该使用Arc / Mutex。但是他的代码已经过时了，所以这里是新版本（版本1.0.0alpha2）。这也不占用线性空间（就像问题中给出的递归代码一样）。

struct Node {
    parent: Option<Rc<RefCell<Node>>>
}

fn find_root(node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
    let mut ans = node.clone(); // Rc<RefCell<Node>>
    loop {
        ans = {
            let ans_ref = ans.borrow(); // std::cell::Ref<Node>
            match ans_ref.parent.clone() {
                None => break,
                Some(z) => z
            }
        } // ans_ref goes out of scope, and ans becomes mutable
    }
    ans
}

fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>) {
    while node.borrow().parent.is_some() {
        let next = {
            let node_ref = node.borrow();
            node_ref.parent.clone().unwrap()
        };
        node.borrow_mut().parent = Some(root.clone());
        // RefMut<Node> from borrow_mut() is out of scope here...
        node = next; // therefore we can mutate node
    }
}

初学者注意：点操作符会自动取消引用指针。 ans.borrow()实际上意味着(*ans).borrow()。我故意为这两个函数使用不同的样式。

Answer 6

虽然不是你的文字问题（移交锁定）的答案，但在Rust中，union-find与加权联合和路径压缩可以非常简单：

fn unionfind<I: Iterator<(uint, uint)>>(mut iterator: I, nodes: uint) -> Vec<uint>
{
    let mut root = Vec::from_fn(nodes, |x| x);
    let mut rank = Vec::from_elem(nodes, 0u8);

    for (mut x, mut y) in iterator
    {
        // find roots for x and y; do path compression on look-ups
        while (x != root[x]) { root[x] = root[root[x]]; x = root[x]; }
        while (y != root[y]) { root[y] = root[root[y]]; y = root[y]; }

        if x != y
        {
            // weighted union swings roots
            match rank[x].cmp(&rank[y])
            {
                Less    => root[x] = y,
                Greater => root[y] = x,
                Equal   =>
                {
                    root[y] = x; 
                    rank[x] += 1 
                },
            }
        }
    }
}

也许元点是联合查找算法可能不是处理节点所有权的最佳位置，并且通过使用对现有内存的引用（在这种情况下，通过仅使用节点的uint标识符）而不影响节点的生命周期使得实现起来更加简单，如果你可以随时使用它。

使用Rust进行手动锁定

6 个答案: