不需要互斥,竞争条件并不总是坏的,是吗?

时间:2018-04-21 22:01:54

标签: c++ multithreading mutex

我有一个疯狂的想法,在我们大多数人通常想要并使用互斥锁同步的某些情况下,可以省略互斥同步。

好的,假设你有这种情况:

Buffer *buffer = new Buffer(); // Initialized by main thread;

...

// The call to buffer's `accumulateSomeData` method is thread-safe
// and is heavily executed by many workers from different threads simultaneously.
buffer->accumulateSomeData(data); // While the code inside is equivalent to vector->push_back()

...

// All lines of code below are executed by a totally separate timer
// thread that executes once per second until the program is finished.

auto bufferPrev = buffer; // A temporary pointer to previous instance

// Switch buffers, put old one offline
buffer = new Buffer();

// As of this line of code all the threads will switch to new instance 
// of buffer. Which yields that calls to `accumulateSomeData`
// are executed over new buffer instance. Which also means that old 
// instance is kinda taken offline and can be safely operated from a
// timer thread.

bufferPrev->flushToDisk(); // Ok, so we can safely flush
delete bufferPrev;

虽然很明显在buffer = new Buffer();期间仍然可以在先前的实例上添加数据的未完成操作。但是由于磁盘操作很慢,我们会遇到自然障碍。

那么,如何在没有互斥同步的情况下估算运行此类代码的风险?

修改

现在这么难以在没有理由的情况下被几个愤怒的家伙抢劫而提出问题。

以下是所有条款代码中的correct

#include <cassert>

#include "leveldb/db.h"
#include "leveldb/filter_policy.h"

#include <iostream>
#include <boost/asio.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>
#include <boost/filesystem.hpp>
#include <boost/lockfree/stack.hpp>
#include <boost/lockfree/queue.hpp>
#include <boost/uuid/uuid.hpp>            // uuid class
#include <boost/uuid/uuid_io.hpp>         // streaming operators etc.
#include <boost/uuid/uuid_generators.hpp> // generators

#include <CommonCrypto/CommonDigest.h>

using namespace std;
using namespace boost::filesystem;

using boost::mutex;
using boost::thread;

enum FileSystemItemType : char {
    Unknown         = 1,
    File            = 0,
    Directory       = 4,

    FileLink        = 2,
    DirectoryLink   = 6
};

// Structure packing optimizations are used in the code below
// http://www.catb.org/esr/structure-packing/
class FileSystemScanner {
private:
    leveldb::DB *database;

    boost::asio::thread_pool pool;

    leveldb::WriteBatch *batch;

    std::atomic<int> queue_size;
    std::atomic<int> workers_online;
    std::atomic<int> entries_processed;
    std::atomic<int> directories_processed;
    std::atomic<uintmax_t> filesystem_usage;

    boost::lockfree::stack<boost::filesystem::path*, boost::lockfree::fixed_sized<false>> directories_pending;

    void work() {
        workers_online++;

        boost::filesystem::path *item;

        if (directories_pending.pop(item) && item != NULL)
        {            
            queue_size--;

            try {
                boost::filesystem::directory_iterator completed;
                boost::filesystem::directory_iterator iterator(*item);

                while (iterator != completed)
                {
                    bool isFailed = false, isSymLink, isDirectory;

                    boost::filesystem::path path = iterator->path();

                    try {
                        isSymLink = boost::filesystem::is_symlink(path);
                        isDirectory = boost::filesystem::is_directory(path);

                    } catch (const boost::filesystem::filesystem_error& e) {
                        isFailed = true;
                        isSymLink = false;
                        isDirectory = false;
                    }

                    if (!isFailed)
                    {
                        if (!isSymLink) {
                            if (isDirectory) {
                                directories_pending.push(new boost::filesystem::path(path));


                                directories_processed++;

                                boost::asio::post(this->pool, [this]() { this->work(); });

                                queue_size++;
                            } else {
                                filesystem_usage += boost::filesystem::file_size(iterator->path());
                            }
                        }
                    }

                    int result = ++entries_processed;

                    if (result % 10000 == 0) {
                        cout << entries_processed.load() << ", " << directories_processed.load() << ", " << queue_size.load() << ", " << workers_online.load() << endl;
                    }

                    ++iterator;
                }

                delete item;
            } catch (boost::filesystem::filesystem_error &e) {

            }
        }

        workers_online--;
    }

public:
    FileSystemScanner(int threads, leveldb::DB* database):
        pool(threads), queue_size(), workers_online(), entries_processed(), directories_processed(), directories_pending(0), database(database)
    {
    }

    void scan(string path) {
        queue_size++;

        directories_pending.push(new boost::filesystem::path(path));

        boost::asio::post(this->pool, [this]() { this->work(); });
    }

    void join() {
        pool.join();
    }
};

int main(int argc, char* argv[])
{
    leveldb::Options opts;

    opts.create_if_missing = true;
    opts.compression = leveldb::CompressionType::kSnappyCompression;
    opts.filter_policy = leveldb::NewBloomFilterPolicy(10);

    leveldb::DB* db;

    leveldb::DB::Open(opts, "/temporary/projx", &db);

    FileSystemScanner scanner(std::thread::hardware_concurrency(), db);

    scanner.scan("/");
    scanner.join();

    return 0;
}

我的问题是:我可以省略我尚未使用的batch的同步吗?既然它是线程安全的,只要在实际将任何结果提交到磁盘之前切换缓冲区就足够了吗?

1 个答案:

答案 0 :(得分:5)

你有一个严重的误解。您认为当您遇到竞争条件时,可能会发生一些特定的事情列表。这不是真的。竞争条件可能导致任何类型的故障,包括崩溃。绝对,绝对不是。你绝对不能这样做。

尽管如此,即使有这种误解,这仍然是一场灾难。

考虑:

buffer = new Buffer();

假设这是通过首先分配内存,然后将buffer设置为指向该内存,然后调用构造函数来实现的。其他线程可以在未构造的缓冲区上操作。热潮。

现在,你可以解决这个问题。但这只是我能想象到的这种搞砸的方式之一。它可能会以我们不够聪明的方式搞砸。所以,对于所有圣洁的事情,甚至不要再想到这样做了。