Question

我有一段代码可以检查两个句子是否过于相似＆＃34;，正如代码中最明确的启发式定义的那样。

fn too_similar(thing1: &String, thing2: &String) -> bool {
    let split1 = thing1.split_whitespace();
    let split2 = thing2.split_whitespace();

    let mut matches = 0;
    for s1 in split1 {
        for s2 in split2 {
            if s1.eq(s2) {
                matches = matches + 1;
                break;
            }
        }
    }

    let longer_length =
        if thing1.len() > thing2.len() {
            thing1.len()
        } else {
            thing2.len()
        };

    matches > longer_length / 2
}

但是，我收到以下编译错误：

error[E0382]: use of moved value: `split2`
 --> src/main.rs:7:19
  |
7 |         for s2 in split2 {
  |                   ^^^^^^ value moved here in previous iteration of loop
  |
  = note: move occurs because `split2` has type `std::str::SplitWhitespace<'_>`, which does not implement the `Copy` trait

我不确定为什么split2首先被移动，但是Rust写这个函数的方式是什么？

Answer 1

split2正在被移动，因为使用for进行迭代会使用迭代器，并且由于该类型没有实现Copy，因此Rust不会隐式复制它。

您可以通过在第一个for：

中创建一个新的迭代器来解决此问题

let split1 = thing1.split_whitespace();

let mut matches = 0;
for s1 in split1 {
    for s2 in thing2.split_whitespace() {
        if s1.eq(s2) {
            matches = matches + 1;
            break;
        }
    }
}
...

您还可以使用matches特征中提供的一些更高阶函数重写Iterator计数循环：

let matches = thing1.split_whitespace()
    .flat_map(|c1| thing2.split_whitespace().filter(move |&c2| c1 == c2))
    .count();

longer_length也可以写成：

let longer_length = std::cmp::max(thing1.len(), thing2.len());

Answer 2

可能有一些更好的方法来进行单词比较。

如果短语很长，那么对thing2中的每个单词迭代thing1个单词效率不高。如果您不必担心出现多次的单词，那么HashSet可能有所帮助，并将迭代归结为以下内容：

let words1: HashSet<&str> = thing1.split_whitespace().collect();
let words2: HashSet<&str> = thing2.split_whitespace().collect();
let matches = words1.intersection(&words2).count();

如果你关心重复的单词，你可能需要HashMap，例如：

let mut words_hash1: HashMap<&str, usize> = HashMap::new();
for word in thing1.split_whitespace() {
    *words_hash1.entry(word).or_insert(0) += 1;
}
let matches2: usize = thing2.split_whitespace()
                     .map(|s| words_hash1.get(s).cloned().unwrap_or(0))
                     .sum();

如何重用SplitWhitespace迭代器？

2 个答案: