std::mutex
与CRITICAL_SECTION
的效果相比如何?它是否相提并论?
我需要轻量级同步对象(不需要是进程间对象)是否有任何STL类接近CRITICAL_SECTION
以外的std::mutex
?
答案 0 :(得分:29)
请在答案结尾处查看我的更新,自Visual Studio 2015以来情况发生了巨大变化。原始答案如下。
我做了一个非常简单的测试,根据我的测量结果,std::mutex
比CRITICAL_SECTION
慢了约50-70倍。
std::mutex: 18140574us
CRITICAL_SECTION: 296874us
编辑:经过一些测试后,结果显示它取决于线程数(拥塞)和CPU核心数。通常,std::mutex
较慢,但多少,取决于使用。以下是更新的测试结果(在带有Core i5-4258U,Windows 10,Bootcamp的MacBook Pro上测试):
Iterations: 1000000
Thread count: 1
std::mutex: 78132us
CRITICAL_SECTION: 31252us
Thread count: 2
std::mutex: 687538us
CRITICAL_SECTION: 140648us
Thread count: 4
std::mutex: 1031277us
CRITICAL_SECTION: 703180us
Thread count: 8
std::mutex: 86779418us
CRITICAL_SECTION: 1634123us
Thread count: 16
std::mutex: 172916124us
CRITICAL_SECTION: 3390895us
以下是产生此输出的代码。使用Visual Studio 2012编译,默认项目设置,Win32发布配置。请注意,此测试可能不完全正确,但在将代码从使用CRITICAL_SECTION
转换为std::mutex
之前,我会三思而行。
#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>
const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;
double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;
void sharedFunc( int i )
{
if ( i % 2 == 0 )
g_shmem = sqrt(g_shmem);
else
g_shmem *= g_shmem;
}
void threadFuncCritSec() {
for ( int i = 0; i < g_cRepeatCount; ++i ) {
EnterCriticalSection( &g_critSec );
sharedFunc(i);
LeaveCriticalSection( &g_critSec );
}
}
void threadFuncMutex() {
for ( int i = 0; i < g_cRepeatCount; ++i ) {
g_mutex.lock();
sharedFunc(i);
g_mutex.unlock();
}
}
void testRound(int threadCount)
{
std::vector<std::thread> threads;
auto startMutex = std::chrono::high_resolution_clock::now();
for (int i = 0; i<threadCount; ++i)
threads.push_back(std::thread( threadFuncMutex ));
for ( std::thread& thd : threads )
thd.join();
auto endMutex = std::chrono::high_resolution_clock::now();
std::cout << "std::mutex: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
std::cout << "us \n\r";
threads.clear();
auto startCritSec = std::chrono::high_resolution_clock::now();
for (int i = 0; i<threadCount; ++i)
threads.push_back(std::thread( threadFuncCritSec ));
for ( std::thread& thd : threads )
thd.join();
auto endCritSec = std::chrono::high_resolution_clock::now();
std::cout << "CRITICAL_SECTION: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
std::cout << "us \n\r";
}
int _tmain(int argc, _TCHAR* argv[]) {
InitializeCriticalSection( &g_critSec );
std::cout << "Iterations: " << g_cRepeatCount << "\n\r";
for (int i = 1; i <= g_cThreadCount; i = i*2) {
std::cout << "Thread count: " << i << "\n\r";
testRound(i);
Sleep(1000);
}
DeleteCriticalSection( &g_critSec );
// Added 10/27/2017 to try to prevent the compiler to completely
// optimize out the code around g_shmem if it wouldn't be used anywhere.
std::cout << "Shared variable value: " << g_shmem << std::endl;
getchar();
return 0;
}
更新10/27/2017(1):
一些答案表明这不是一个现实的测试或不代表“现实世界”的情景。这是真的,这个测试试图测量std::mutex
的开销,它并没有试图证明99%的应用程序的差异可以忽略不计。
更新10/27/2017(2):
从Visual Studio 2015(VC140)开始,似乎情况发生了变化,有利于std::mutex
。我使用了VS2017 IDE,与上面的代码完全相同,x64版本配置,禁用了优化,我只是为每个测试切换了“Platform Toolset”。结果非常令人惊讶,我真的很好奇VC140中挂了什么。
答案 1 :(得分:25)
沃尔德兹在这里的测试是不现实的,它基本上模拟了100%的争用。通常,这正是您在多线程代码中不想要的。下面是一个修改过的测试,可以进行一些共享计算我用这段代码得到的结果是不同的:
Tasks: 160000
Thread count: 1
std::mutex: 12096ms
CRITICAL_SECTION: 12060ms
Thread count: 2
std::mutex: 5206ms
CRITICAL_SECTION: 5110ms
Thread count: 4
std::mutex: 2643ms
CRITICAL_SECTION: 2625ms
Thread count: 8
std::mutex: 1632ms
CRITICAL_SECTION: 1702ms
Thread count: 12
std::mutex: 1227ms
CRITICAL_SECTION: 1244ms
你可以在这里看到,对我来说(使用VS2013)std :: mutex和CRITICAL_SECTION之间的数字非常接近。请注意,此代码执行固定数量的任务(160,000),这就是为什么性能通常随着更多线程而改善的原因。我在这里有12个核心,这就是为什么我在12点停下来。
与其他测试相比,我并不是说这是对还是错,但它确实强调时间问题通常是特定领域的。
#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>
const int tastCount = 160000;
int numThreads;
const int MAX_THREADS = 16;
double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;
void sharedFunc(int i, double &data)
{
for (int j = 0; j < 100; j++)
{
if (j % 2 == 0)
data = sqrt(data);
else
data *= data;
}
}
void threadFuncCritSec() {
double lMem = 8;
int iterations = tastCount / numThreads;
for (int i = 0; i < iterations; ++i) {
for (int j = 0; j < 100; j++)
sharedFunc(j, lMem);
EnterCriticalSection(&g_critSec);
sharedFunc(i, g_shmem);
LeaveCriticalSection(&g_critSec);
}
printf("results: %f\n", lMem);
}
void threadFuncMutex() {
double lMem = 8;
int iterations = tastCount / numThreads;
for (int i = 0; i < iterations; ++i) {
for (int j = 0; j < 100; j++)
sharedFunc(j, lMem);
g_mutex.lock();
sharedFunc(i, g_shmem);
g_mutex.unlock();
}
printf("results: %f\n", lMem);
}
void testRound()
{
std::vector<std::thread> threads;
auto startMutex = std::chrono::high_resolution_clock::now();
for (int i = 0; i < numThreads; ++i)
threads.push_back(std::thread(threadFuncMutex));
for (std::thread& thd : threads)
thd.join();
auto endMutex = std::chrono::high_resolution_clock::now();
std::cout << "std::mutex: ";
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endMutex - startMutex).count();
std::cout << "ms \n\r";
threads.clear();
auto startCritSec = std::chrono::high_resolution_clock::now();
for (int i = 0; i < numThreads; ++i)
threads.push_back(std::thread(threadFuncCritSec));
for (std::thread& thd : threads)
thd.join();
auto endCritSec = std::chrono::high_resolution_clock::now();
std::cout << "CRITICAL_SECTION: ";
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endCritSec - startCritSec).count();
std::cout << "ms \n\r";
}
int _tmain(int argc, _TCHAR* argv[]) {
InitializeCriticalSection(&g_critSec);
std::cout << "Tasks: " << tastCount << "\n\r";
for (numThreads = 1; numThreads <= MAX_THREADS; numThreads = numThreads * 2) {
if (numThreads == 16)
numThreads = 12;
Sleep(100);
std::cout << "Thread count: " << numThreads << "\n\r";
testRound();
}
DeleteCriticalSection(&g_critSec);
return 0;
}
答案 2 :(得分:2)
我正在使用Visual Studio 2013。
单线程使用的结果与waldez结果类似:
100万次锁定/解锁通话:
CRITICAL_SECTION: 19 ms
std::mutex: 48 ms
std::recursive_mutex: 48 ms
Microsoft更改实现的原因是C ++ 11兼容性。 C ++ 11在std命名空间中有4种互斥:
Microsoft std :: mutex和所有其他互斥锁是关键部分的包装器:
struct _Mtx_internal_imp_t
{ /* Win32 mutex */
int type; // here MS keeps particular mutex type
Concurrency::critical_section cs;
long thread_id;
int count;
};
至于我,std :: recursive_mutex应该完全匹配临界区。因此,微软应该优化其实现,以减少CPU和内存。
答案 3 :(得分:1)
我在这里搜索pthread与临界区基准,但是,由于我的结果与waldez关于该主题的答案不同,我认为分享会很有趣。
代码是@waldez使用的代码,修改为将pthreads添加到比较中,使用GCC编译并且没有优化。我的CPU是AMD A8-3530MX。
Windows 7 Home Edition:
>a.exe
Iterations: 1000000
Thread count: 1
std::mutex: 46800us
CRITICAL_SECTION: 31200us
pthreads: 31200us
Thread count: 2
std::mutex: 171600us
CRITICAL_SECTION: 218400us
pthreads: 124800us
Thread count: 4
std::mutex: 327600us
CRITICAL_SECTION: 374400us
pthreads: 249600us
Thread count: 8
std::mutex: 967201us
CRITICAL_SECTION: 748801us
pthreads: 717601us
Thread count: 16
std::mutex: 2745604us
CRITICAL_SECTION: 1497602us
pthreads: 1903203us
正如您所看到的,差异在统计误差范围内变化很大 - 有时std :: mutex更快,有时则不然。重要的是,我没有观察到原始答案的巨大差异。
我想,也许原因是,当答案发布时,MSVC编译器对新标准不好,并注意到原始答案使用了2012年的版本。
另外,出于好奇,在Archlinux上Wine下的二进制文件:
$ wine a.exe
fixme:winediag:start_process Wine Staging 2.19 is a testing version containing experimental patches.
fixme:winediag:start_process Please mention your exact version when filing bug reports on winehq.org.
Iterations: 1000000
Thread count: 1
std::mutex: 53810us
CRITICAL_SECTION: 95165us
pthreads: 62316us
Thread count: 2
std::mutex: 604418us
CRITICAL_SECTION: 1192601us
pthreads: 688960us
Thread count: 4
std::mutex: 779817us
CRITICAL_SECTION: 2476287us
pthreads: 818022us
Thread count: 8
std::mutex: 1806607us
CRITICAL_SECTION: 7246986us
pthreads: 809566us
Thread count: 16
std::mutex: 2987472us
CRITICAL_SECTION: 14740350us
pthreads: 1453991us
瓦尔德兹的代码与我的修改:
#include <math.h>
#include <windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>
#include <pthread.h>
const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;
double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;
pthread_mutex_t pt_mutex;
void sharedFunc( int i )
{
if ( i % 2 == 0 )
g_shmem = sqrt(g_shmem);
else
g_shmem *= g_shmem;
}
void threadFuncCritSec() {
for ( int i = 0; i < g_cRepeatCount; ++i ) {
EnterCriticalSection( &g_critSec );
sharedFunc(i);
LeaveCriticalSection( &g_critSec );
}
}
void threadFuncMutex() {
for ( int i = 0; i < g_cRepeatCount; ++i ) {
g_mutex.lock();
sharedFunc(i);
g_mutex.unlock();
}
}
void threadFuncPTMutex() {
for ( int i = 0; i < g_cRepeatCount; ++i ) {
pthread_mutex_lock(&pt_mutex);
sharedFunc(i);
pthread_mutex_unlock(&pt_mutex);
}
}
void testRound(int threadCount)
{
std::vector<std::thread> threads;
auto startMutex = std::chrono::high_resolution_clock::now();
for (int i = 0; i<threadCount; ++i)
threads.push_back(std::thread( threadFuncMutex ));
for ( std::thread& thd : threads )
thd.join();
auto endMutex = std::chrono::high_resolution_clock::now();
std::cout << "std::mutex: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
std::cout << "us \n";
g_shmem = 0;
threads.clear();
auto startCritSec = std::chrono::high_resolution_clock::now();
for (int i = 0; i<threadCount; ++i)
threads.push_back(std::thread( threadFuncCritSec ));
for ( std::thread& thd : threads )
thd.join();
auto endCritSec = std::chrono::high_resolution_clock::now();
std::cout << "CRITICAL_SECTION: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
std::cout << "us \n";
g_shmem = 0;
threads.clear();
auto startPThread = std::chrono::high_resolution_clock::now();
for (int i = 0; i<threadCount; ++i)
threads.push_back(std::thread( threadFuncPTMutex ));
for ( std::thread& thd : threads )
thd.join();
auto endPThread = std::chrono::high_resolution_clock::now();
std::cout << "pthreads: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endPThread - startPThread).count();
std::cout << "us \n";
g_shmem = 0;
}
int main() {
InitializeCriticalSection( &g_critSec );
pthread_mutex_init(&pt_mutex, 0);
std::cout << "Iterations: " << g_cRepeatCount << "\n";
for (int i = 1; i <= g_cThreadCount; i = i*2) {
std::cout << "Thread count: " << i << "\n";
testRound(i);
Sleep(1000);
}
getchar();
DeleteCriticalSection( &g_critSec );
pthread_mutex_destroy(&pt_mutex);
return 0;
}
答案 4 :(得分:0)
Waldez修改为test program与pthreads和boost :: mutex一起运行。
在win10 pro上(使用intel i7-7820X 16核cpu)我从VS {{}}获得了更好的结果来自VS2015 update3上的std :: mutex(更好的是来自boost :: mutex):
Iterations: 1000000
Thread count: 1
std::mutex: 23403us
boost::mutex: 12574us
CRITICAL_SECTION: 19454us
Thread count: 2
std::mutex: 55031us
boost::mutex: 45263us
CRITICAL_SECTION: 187597us
Thread count: 4
std::mutex: 113964us
boost::mutex: 83699us
CRITICAL_SECTION: 605765us
Thread count: 8
std::mutex: 266091us
boost::mutex: 155265us
CRITICAL_SECTION: 1908491us
Thread count: 16
std::mutex: 633032us
boost::mutex: 300076us
CRITICAL_SECTION: 4015176us
pthreads的结果为CRITICAL_SECTION。
#ifdef _WIN32
#include <Windows.h>
#endif
#include <mutex>
#include <boost/thread/mutex.hpp>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>
const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;
double g_shmem = 8;
std::recursive_mutex g_mutex;
boost::mutex g_boostMutex;
void sharedFunc(int i)
{
if (i % 2 == 0)
g_shmem = sqrt(g_shmem);
else
g_shmem *= g_shmem;
}
#ifdef _WIN32
CRITICAL_SECTION g_critSec;
void threadFuncCritSec()
{
for (int i = 0; i < g_cRepeatCount; ++i)
{
EnterCriticalSection(&g_critSec);
sharedFunc(i);
LeaveCriticalSection(&g_critSec);
}
}
#else
pthread_mutex_t pt_mutex;
void threadFuncPtMutex()
{
for (int i = 0; i < g_cRepeatCount; ++i) {
pthread_mutex_lock(&pt_mutex);
sharedFunc(i);
pthread_mutex_unlock(&pt_mutex);
}
}
#endif
void threadFuncMutex()
{
for (int i = 0; i < g_cRepeatCount; ++i)
{
g_mutex.lock();
sharedFunc(i);
g_mutex.unlock();
}
}
void threadFuncBoostMutex()
{
for (int i = 0; i < g_cRepeatCount; ++i)
{
g_boostMutex.lock();
sharedFunc(i);
g_boostMutex.unlock();
}
}
void testRound(int threadCount)
{
std::vector<std::thread> threads;
std::cout << "\nThread count: " << threadCount << "\n\r";
auto startMutex = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threadCount; ++i)
threads.push_back(std::thread(threadFuncMutex));
for (std::thread& thd : threads)
thd.join();
threads.clear();
auto endMutex = std::chrono::high_resolution_clock::now();
std::cout << "std::mutex: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
std::cout << "us \n\r";
auto startBoostMutex = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threadCount; ++i)
threads.push_back(std::thread(threadFuncBoostMutex));
for (std::thread& thd : threads)
thd.join();
threads.clear();
auto endBoostMutex = std::chrono::high_resolution_clock::now();
std::cout << "boost::mutex: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endBoostMutex - startBoostMutex).count();
std::cout << "us \n\r";
#ifdef _WIN32
auto startCritSec = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threadCount; ++i)
threads.push_back(std::thread(threadFuncCritSec));
for (std::thread& thd : threads)
thd.join();
threads.clear();
auto endCritSec = std::chrono::high_resolution_clock::now();
std::cout << "CRITICAL_SECTION: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
std::cout << "us \n\r";
#else
auto startPThread = std::chrono::high_resolution_clock::now();
for (int i = 0; i < threadCount; ++i)
threads.push_back(std::thread(threadFuncPtMutex));
for (std::thread& thd : threads)
thd.join();
threads.clear();
auto endPThread = std::chrono::high_resolution_clock::now();
std::cout << "pthreads: ";
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endPThread - startPThread).count();
std::cout << "us \n";
#endif
}
int main()
{
#ifdef _WIN32
InitializeCriticalSection(&g_critSec);
#else
pthread_mutex_init(&pt_mutex, 0);
#endif
std::cout << "Iterations: " << g_cRepeatCount << "\n\r";
for (int i = 1; i <= g_cThreadCount; i = i * 2)
{
testRound(i);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
#ifdef _WIN32
DeleteCriticalSection(&g_critSec);
#else
pthread_mutex_destroy(&pt_mutex);
#endif
if (rand() % 10000 == 1)
{
// Added 10/27/2017 to try to prevent the compiler to completely
// optimize out the code around g_shmem if it wouldn't be used anywhere.
std::cout << "Shared variable value: " << g_shmem << std::endl;
}
return 0;
}
答案 5 :(得分:0)
My results for test1
Iterations: 1000000
Thread count: 1
std::mutex: 27085us
CRITICAL_SECTION: 12035us
Thread count: 2
std::mutex: 40412us
CRITICAL_SECTION: 119952us
Thread count: 4
std::mutex: 123214us
CRITICAL_SECTION: 314774us
Thread count: 8
std::mutex: 387737us
CRITICAL_SECTION: 1664506us
Thread count: 16
std::mutex: 836901us
CRITICAL_SECTION: 3837877us
Shared variable value: 8
测试 2
Tasks: 160000
Thread count: 1
results: 8.000000
std::mutex: 4642ms
results: 8.000000
CRITICAL_SECTION: 4588ms
Thread count: 2
results: 8.000000
results: 8.000000
std::mutex: 2309ms
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 2307ms
Thread count: 4
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
std::mutex: 1169ms
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 1162ms
Thread count: 8
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
std::mutex: 640ms
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 628ms
Thread count: 12
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
std::mutex: 745ms
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 672ms