我正在尝试创建一些String
个集合,然后合并其中一些集合,使它们具有相同的标记(类型为usize
)。初始化地图后,我开始添加字符串:
self.clusters.make_set("a");
self.clusters.make_set("b");
当我调用self.clusters.find("a")
和self.clusters.find("b")
时,会返回不同的值,这很好,因为我还没有合并这些集合。然后我调用以下方法合并两组
let _ = self.clusters.union("a", "b");
如果我现在拨打self.clusters.find("a")
和self.clusters.find("b")
,我会获得相同的价值。但是,当我调用finalize()
方法并尝试遍历地图时,会返回原始标记,就像我从未合并过这些集合一样。
self.clusters.finalize();
for (address, tag) in &self.clusters.map {
self.clusterizer_writer.write_all(format!("{};{}\n", address,
self.clusters.parent[*tag]).as_bytes()).unwrap();
}
// to output all keys with the same tag as a list.
let a: Vec<(usize, Vec<String>)> = {
let mut x = HashMap::new();
for (k, v) in self.clusters.map.clone() {
x.entry(v).or_insert_with(Vec::new).push(k)
}
x.into_iter().collect()
};
我无法弄清楚为什么会这样,但我对Rust相对较新;也许这是一个指针问题?
而不是“a”和“b”,我实际上使用类似utils::arr_to_hex(&input.outpoint.txid)
String
的内容。
这是我正在使用的Union-Find算法的Rust实现:
/// Tarjan's Union-Find data structure.
#[derive(RustcDecodable, RustcEncodable)]
pub struct DisjointSet<T: Clone + Hash + Eq> {
set_size: usize,
parent: Vec<usize>,
rank: Vec<usize>,
map: HashMap<T, usize>, // Each T entry is mapped onto a usize tag.
}
impl<T> DisjointSet<T>
where
T: Clone + Hash + Eq,
{
pub fn new() -> Self {
const CAPACITY: usize = 1000000;
DisjointSet {
set_size: 0,
parent: Vec::with_capacity(CAPACITY),
rank: Vec::with_capacity(CAPACITY),
map: HashMap::with_capacity(CAPACITY),
}
}
pub fn make_set(&mut self, x: T) {
if self.map.contains_key(&x) {
return;
}
let len = &mut self.set_size;
self.map.insert(x, *len);
self.parent.push(*len);
self.rank.push(0);
*len += 1;
}
/// Returns Some(num), num is the tag of subset in which x is.
/// If x is not in the data structure, it returns None.
pub fn find(&mut self, x: T) -> Option<usize> {
let pos: usize;
match self.map.get(&x) {
Some(p) => {
pos = *p;
}
None => return None,
}
let ret = DisjointSet::<T>::find_internal(&mut self.parent, pos);
Some(ret)
}
/// Implements path compression.
fn find_internal(p: &mut Vec<usize>, n: usize) -> usize {
if p[n] != n {
let parent = p[n];
p[n] = DisjointSet::<T>::find_internal(p, parent);
p[n]
} else {
n
}
}
/// Union the subsets to which x and y belong.
/// If it returns Ok<u32>, it is the tag for unified subset.
/// If it returns Err(), at least one of x and y is not in the disjoint-set.
pub fn union(&mut self, x: T, y: T) -> Result<usize, ()> {
let x_root;
let y_root;
let x_rank;
let y_rank;
match self.find(x) {
Some(x_r) => {
x_root = x_r;
x_rank = self.rank[x_root];
}
None => {
return Err(());
}
}
match self.find(y) {
Some(y_r) => {
y_root = y_r;
y_rank = self.rank[y_root];
}
None => {
return Err(());
}
}
// Implements union-by-rank optimization.
if x_root == y_root {
return Ok(x_root);
}
if x_rank > y_rank {
self.parent[y_root] = x_root;
return Ok(x_root);
} else {
self.parent[x_root] = y_root;
if x_rank == y_rank {
self.rank[y_root] += 1;
}
return Ok(y_root);
}
}
/// Forces all laziness, updating every tag.
pub fn finalize(&mut self) {
for i in 0..self.set_size {
DisjointSet::<T>::find_internal(&mut self.parent, i);
}
}
}
答案 0 :(得分:1)
我认为你并没有正确地从DisjointSet
结构中提取信息。
我得到了sniped并实现了联合查找。首先,使用基本的usize实现:
pub struct UnionFinderImpl {
parent: Vec<usize>,
}
然后使用更通用类型的包装器:
pub struct UnionFinder<T: Hash> {
rev: Vec<Rc<T>>,
fwd: HashMap<Rc<T>, usize>,
uf: UnionFinderImpl,
}
两个结构都实现了groups()
方法,该方法返回Vec<Vec<>>
个组。由于我使用了Clone
,因此不需要Rc
。