我有一个程序在运行多线程时在一个非常奇怪的地方获得带有EXC_BAD_ACCESS的KERN_PROTECTION_FAILURE,而且我还不知道如何进一步解决它。这是在使用GCC的MacOS 10.6上。
它得到的一个非常奇怪的地方是进入一个功能。不是在函数的第一行,而是实际跳转到函数GetMachineFactors():
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb00009ec
[Switching to process 28242]
0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
168 MachineFactors* GetMachineFactors()
(gdb) bt
#0 0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
#1 0x000156d0 in CollectMachineFactorsThreadProc (parameter=0x200280) at Threads.cpp:341
#2 0x952f681d in _pthread_start ()
#3 0x952f66a2 in thread_start ()
(gdb)
如果我运行这个非线程,它运行良好,没有问题:
#include "MachineFactors.h"
int main( int argc, char** argv )
{
MachineFactors* factors = GetMachineFactors();
std::string str = CreateJSONObject(factors);
cout << str;
delete factors;
return 0;
}
如果我在pthread中运行它,我会得到上面的EXC_BAD_ACCESS。
THREAD_FUNCTION CollectMachineFactorsThreadProc( LPVOID parameter )
{
Main* client = (Main*) parameter;
if ( parameter == NULL )
{
ERRORLOG( "No data passed to machine identification thread. Aborting." );
return 0;
}
MachineFactors* mfactors = GetMachineFactors(); // This is where it dies.
// If I don't call GetMachineFactors and do something like mfactors =
// new MachineFactors(); everything is good and the threads communicate and exit
// normally.
if (mfactors == NULL)
{
ERRORLOG("Failed to collect machine identification: GetMachineFactors returned NULL." << endl)
return 0;
}
client->machineFactors = CreateJSONObject(mfactors);
delete mfactors;
EVENT_RAISE(client->machineFactorsEvent);
return 0;
}
以下是GetMachineFactors()代码的摘录:
MachineFactors* GetMachineFactors() // Dies on this line in multi-threaded.
{
// printf( "Getting machine factors.\n"); // Tried with and without this, never prints.
factors = new MachineFactors();
factors->OSName = "MacOS";
factors->Manufacturer = "Apple";
///…
// gather various machine metrics here.
//…
return factors;
}
作为参考,我使用socketpair等待线程完成:
// From the header file I use for cross-platform defines (this runs on OSX, Windows, and Linux.
struct _waitt
{
int fds[2];
};
#define THREAD_FUNCTION void*
#define THREAD_REFERENCE pthread_t
#define MUTEX_REFERENCE pthread_mutex_t*
#define MUTEX_LOCK(m) pthread_mutex_lock(m)
#define MUTEX_UNLOCK pthread_mutex_unlock
#define EVENT_REFERENCE struct _waitt
#define EVENT_WAIT(m) do { char lc; if (read(m.fds[0], &lc, 1)) {} } while (0)
#define EVENT_RAISE(m) do { char lc = 'j'; if (write(m.fds[1], &lc, 1)) {} } while (0)
#define EVENT_NULL(m) do { m.fds[0] = -1; m.fds[1] = -1; } while (0)
以下是我启动帖子的代码。
void Main::CollectMachineFactors()
{
#ifdef WIN32
machineFactorsThread = CreateThread(NULL, 0, CollectMachineFactorsThreadProc, this, 0, 0);
if ( machineFactorsThread == NULL )
{
ERRORLOG( "Could not create thread for machine id: " << ERROR_NO << endl )
}
#else
int retval = pthread_create(&machineFactorsThread, NULL, CollectMachineFactorsThreadProc, this);
if (retval)
{
ERRORLOG( "Return code from machine id pthread_create() is " << retval << endl )
}
#endif
}
这是运行此多线程的简单故障情况。对于具有上面的堆栈跟踪的代码,它总是失败:
CollectMachineFactors();
EVENT_WAIT(machineFactorsEvent);
cout << machineFactors;
return 0;
起初我怀疑是图书馆问题。这是我的makefile:
# Main executable file
PROGRAM = sysinfo
# Object files
OBJECTS = Version.h Main.o Protocol.o Socket.o SSLConnection.o Stats.o TimeElapsed.o Formatter.o OSX.o Threads.o
# Include directories
INCLUDE = -Itaocrypt/include -IyaSSL/taocrypt/mySTL -IyaSSL/include -isysroot /Developer/SDKs/MacOSX10.5.sdk -mmacosx-version-min=10.5
# Library settings
STATICLIBS = libtaocrypt.a libyassl.a -Wl,-rpath,. -ldl -lpthread -lz -lexpat
# Compile settings
RELCXX = g++ -g -ggdb -DDEBUG -Wall $(INCLUDE)
.SUFFIXES: .o .cpp
.cpp.o :
$(RELCXX) -c -Wall $(INCLUDE) -o $@ $<
all: $(PROGRAM)
$(PROGRAM): $(OBJECTS)
$(RELCXX) -o $(PROGRAM) $(OBJECTS) $(STATICLIBS)
clean:
rm -f *.o $(PROGRAM)
我不能为我的生活看到任何特别奇怪或危险的东西,我不知道在哪里看。相同的线程进程在我尝试的任何Linux机器上都能正常工作。有什么建议?我应该尝试哪些工具?
如果它有用,我可以添加更多信息。
答案 0 :(得分:0)
我可以看到您的Windows代码存在问题,但不会发现崩溃的OSX代码。
您似乎没有发布GetMachineFactors
的实际代码,因为未声明变量factors
。但是关于调试,你不应该将printf
输出的不出现作为该语句尚未执行的结论。使用调试器工具,例如设置断点,使用特殊的调试器跟踪输出等等(不确定gdb处理什么,它是一个非常原始的调试器,但也许Apple有更好的工具?)。
对于Windows,您应该使用运行时库的线程创建而不是Windows API CreateThread
。那是因为CreateThread
没有通知运行时库。例如,使用运行时库的new
表达式或其他调用可能会失败。
对不起,我帮不了多忙。
我认为它可能与您未展示的GetMachineFactors
代码有关?
答案 1 :(得分:0)
事实证明,我无法解释为什么fork()调用与socketpair()结合作为IPC机制是解决问题的方法。
我希望我知道为什么它首先失败了( headscratch ),但这种方法似乎是一个很好的解决方法。
这几乎看起来就像是由于在更改头文件后未能运行'make clean'而引起的“构建问题”,但这种情况并非如此。