Question

我一直在尝试用C ++构建一个实体组件系统，我已经开始工作了，但是我为处理阶段之间的等待而编写的代码效率非常低。

我目前有两个阶段。第一个是Ticking，其中所有线程都通过所有系统并勾选那些可勾选的线程。第二阶段是Resolution阶段，其中每个线程都需要许多系统，并从中添加和删除组件。

无论如何，在每个阶段之后，我让每个线程以原子方式递增变量并忙于等待，直到所有线程都增加了该变量。一旦两个阶段完成，我将反向遵循相同的模式（每个阶段减少并等待到零），并通过当前勾号向主线程发出线程完成的信号。

有更好的方法吗？目前，这导致8个线程的速度减慢约17毫秒，如果可能的话，我想让它更快。

以下是我的代码。请原谅糟糕的代码质量;我一直在戳戳它并且没有清理它（特别是memory_orders）。

    void Engine::threadRun(uint64_t a_ThreadID)
    {
    m_Mutex.lock();
    std::cerr << "Thread #" << a_ThreadID << " starting." << std::endl;
    m_Mutex.unlock();

    while (m_State.load(std::memory_order_acq_rel) > STOPPED)
    {
        if (m_State.load(std::memory_order_acq_rel) == RUNNING)
        {
            for (auto itr = m_Systems->begin(); itr != m_Systems->end(); ++itr)
            {
                auto a_system = itr->second;
                if(a_system->isTickable())
                {
                    a_system->tick(m_ThreadCount, a_ThreadID, m_DT.load(std::memory_order_release));
                }
            }

            // Wait until all threads are done to move onto the resolution phase
            m_Mutex.lock();
            m_TickStageCompleted.fetch_add(1, std::memory_order_release);
            m_Mutex.unlock();
            while(m_TickStageCompleted.load(std::memory_order_acquire) != m_ThreadCount);

            size_t count = m_Systems->size() / m_ThreadCount + 1; // How many per thread
            auto start_idx = count * a_ThreadID;
            if (start_idx < m_Systems->size())
            {
                // There's got to be a better way to start at a specific index in a map
                // Maybe switch to two vectors with one storing the name and the other the system
                auto index = m_Systems->begin();
                std::advance(index, (start_idx));
                auto end = index;

                while(end != m_Systems->end() && count)
                {
                    ++end;
                    --count;
                }

                for (; index != end && index != m_Systems->end(); ++index)
                {
                    index->second->resolve();
                }
            }
            /**
             * This section is super inefficient. I need to find a better way of doing this
             **/
            // Wait until resolutions are complete to continue
            m_Mutex.lock();
            m_ResolutionStageCompleted.fetch_add(1, std::memory_order_release);
            m_Mutex.unlock();
            while(m_ResolutionStageCompleted.load(std::memory_order_acquire) != m_ThreadCount);

            // Signal the main thread that this run completed successfully
            m_State.store(PAUSED, std::memory_order_release);

            // Reset everything
            m_Mutex.lock();
            m_TickStageCompleted.fetch_sub(1);
            m_Mutex.unlock();
            while(m_TickStageCompleted.load(std::memory_order_release) != 0);

            m_Mutex.lock();
            m_ResolutionStageCompleted.fetch_sub(1);
            m_Mutex.unlock();
            while(m_ResolutionStageCompleted.load(std::memory_order_release) != 0);
        }
    }
    m_Mutex.lock();
    std::cerr << "Thread #" << a_ThreadID << " ending." << std::endl;
    m_Mutex.unlock();
}

在开始下一个任务之前，如何有效地等待所有线程完成任务？

0 个答案: