Question

我正在尝试使用Rayon的par_iter()优化我的功能。

单线程版本类似于：

fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {

    let result = txs.iter().map(|tx| {

         tx.verify_and_store(store)

    }).collect();

    ...
}

每个Store实例只能由一个线程使用，但Store的多个实例可以同时使用，因此我可以通过clone - store来实现这个多线程}：

fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {

    let result = txs.par_iter().map(|tx| {

         let mut local_store = store.clone();

         tx.verify_and_store(&mut local_store)

    }).collect();

    ...
}

但是，这会在每次迭代时克隆store，这太慢了。我想为每个线程使用一个商店实例。

Rayon可以吗？或者我应该使用手动线程和工作队列？

Answer 1

可以使用线程局部变量来确保在给定线程中不会多次创建local_store。

例如，这会编译（full source）：

fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
    use std::cell::RefCell;
    thread_local!(static STORE: RefCell<Option<Store>> = RefCell::new(None));

    let mut result = Vec::new();

    txs.par_iter().map(|tx| {
        STORE.with(|cell| {
            let mut local_store = cell.borrow_mut();
            if local_store.is_none() {
                *local_store = Some(store.clone());
            }
            tx.verify_and_store(local_store.as_mut().unwrap())
        })
    }).collect_into(&mut result);
}

但是，此代码存在两个问题。其一，如果store的克隆在完成par_iter()时需要做某事，比如刷新缓冲区，它就不会发生 - 它们只会在Drop被调用时被调用Rayon的工作线程退出，甚至退出is not guaranteed。

第二个也是更严重的问题是每个工作线程只创建一次store的克隆。如果Rayon缓存其线程池（并且我相信它确实如此），这意味着稍后对verify_and_store的无关调用将继续使用store的最后已知克隆，这可能与当前无关存储。

这可以通过使代码复杂化来解决：

将克隆的变量存储在Mutex<Option<...>>而不是Option中，以便调用par_iter()的线程可以访问它们。这将导致每次访问都会发生互斥锁，但锁定将无争议，因此便宜。
在互斥锁周围使用Arc，以便在向量中收集对已创建的商店克隆的引用。该向量用于在迭代完成后将它们重置为None来清理商店。
将整个通话包裹在一个不相关的互斥锁中，这样两个并行呼叫verify_and_store就不会看到彼此的商店克隆。（如果在迭代之前创建并安装了新的线程池，这可能是可以避免的。）希望这个序列化不会影响verify_and_store的性能，因为每个调用都将使用整个线程池。

结果并不漂亮，但它编译，只使用安全代码，并且似乎有效：

fn verify_and_store(store: &mut Store, txs: Vec<Tx>) {
    use std::sync::{Arc, Mutex};
    type SharedStore = Arc<Mutex<Option<Store>>>;

    lazy_static! {
        static ref STORE_CLONES: Mutex<Vec<SharedStore>> = Mutex::new(Vec::new());
        static ref NO_REENTRY: Mutex<()> = Mutex::new(());
    }
    thread_local!(static STORE: SharedStore = Arc::new(Mutex::new(None)));

    let mut result = Vec::new();
    let _no_reentry = NO_REENTRY.lock();

    txs.par_iter().map({
        |tx| {
            STORE.with(|arc_mtx| {
                let mut local_store = arc_mtx.lock().unwrap();
                if local_store.is_none() {
                    *local_store = Some(store.clone());
                    STORE_CLONES.lock().unwrap().push(arc_mtx.clone());
                }
                tx.verify_and_store(local_store.as_mut().unwrap())
            })
        }
    }).collect_into(&mut result);

    let mut store_clones = STORE_CLONES.lock().unwrap();
    for store in store_clones.drain(..) {
        store.lock().unwrap().take();
    }
}

Answer 2

旧问题，但我认为答案需要重新考虑。通常，有两种方法：

使用var folderItem = new ListItem { ContentType = new ContentTypeInfo() { Id = "0x0120" }, Fields = new FieldValueSet() { AdditionalData = new Dictionary<string,object> { { "Title", folderName } } } }; await graphClient .Sites.Root .Lists["<list name or id>"] .Items .Request() .AddAsync(folderItem);。每当一个线程从另一个线程窃取工作项时，它将克隆。这可能会克隆比线程数更多的存储，但是它应该相当低。如果克隆太昂贵，则可以增加人造丝的大小，以将map_with分割工作负载。

with_min_len

或使用thread_local板条箱中的作用域fn verify_and_store(store: &mut Store, txs: Vec<Tx>) { let result = txs.iter().map_with(|| store.clone(), |store, tx| { tx.verify_and_store(store) }).collect(); ... }。这将确保您仅使用与线程数一样多的对象，并且一旦ThreadLocal对象超出范围，它们将被销毁。

ThreadLocal

Rayon中的每线程初始化

2 个答案: