在MacOS上的功能输入上需要帮助跟踪EXC_BAD_ACCESS

时间:2010-10-07 22:26:11

标签: c++ multithreading macos debugging exc-bad-access

我有一个程序在运行多线程时在一个非常奇怪的地方获得带有EXC_BAD_ACCESS的KERN_PROTECTION_FAILURE,而且我还不知道如何进一步解决它。这是在使用GCC的MacOS 10.6上。

它得到的一个非常奇怪的地方是进入一个功能。不是在函数的第一行,而是实际跳转到函数GetMachineFactors():

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb00009ec
[Switching to process 28242]
0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
168 MachineFactors* GetMachineFactors()
(gdb) bt
#0  0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
#1  0x000156d0 in CollectMachineFactorsThreadProc (parameter=0x200280) at Threads.cpp:341
#2  0x952f681d in _pthread_start ()
#3  0x952f66a2 in thread_start ()
(gdb) 

如果我运行这个非线程,它运行良好,没有问题:

#include "MachineFactors.h"

int main( int argc, char** argv )
{
    MachineFactors* factors = GetMachineFactors();
    std::string str = CreateJSONObject(factors);
    cout << str;
    delete factors;
    return 0;
}

如果我在pthread中运行它,我会得到上面的EXC_BAD_ACCESS。

THREAD_FUNCTION CollectMachineFactorsThreadProc( LPVOID parameter )
{
    Main* client = (Main*) parameter;
    if ( parameter == NULL )
    {
        ERRORLOG( "No data passed to machine identification thread.  Aborting." );
        return 0;
    }
    MachineFactors* mfactors = GetMachineFactors(); // This is where it dies.
    // If I don't call GetMachineFactors and do something like mfactors =
    // new MachineFactors(); everything is good and the threads communicate and exit
    // normally.
    if (mfactors == NULL)
    {
        ERRORLOG("Failed to collect machine identification: GetMachineFactors returned NULL." << endl)
        return 0;
    }
    client->machineFactors = CreateJSONObject(mfactors);
    delete mfactors;
    EVENT_RAISE(client->machineFactorsEvent);
    return 0;
}

以下是GetMachineFactors()代码的摘录:

MachineFactors* GetMachineFactors() // Dies on this line in multi-threaded.
{
    // printf( "Getting machine factors.\n"); // Tried with and without this, never prints.
    factors = new MachineFactors();
    factors->OSName = "MacOS";
    factors->Manufacturer = "Apple";
    ///…
    // gather various machine metrics here.
    //…
    return factors;
}

作为参考,我使用socketpair等待线程完成:

// From the header file I use for cross-platform defines (this runs on OSX, Windows, and Linux.
struct _waitt
{
  int fds[2];
};
#define THREAD_FUNCTION void*
#define THREAD_REFERENCE pthread_t
#define MUTEX_REFERENCE pthread_mutex_t*
#define MUTEX_LOCK(m) pthread_mutex_lock(m)
#define MUTEX_UNLOCK pthread_mutex_unlock
#define EVENT_REFERENCE struct _waitt
#define EVENT_WAIT(m) do { char lc; if (read(m.fds[0], &lc, 1)) {} } while (0)
#define EVENT_RAISE(m) do { char lc = 'j'; if (write(m.fds[1], &lc, 1)) {} } while (0)
#define EVENT_NULL(m) do { m.fds[0] = -1; m.fds[1] = -1; } while (0)

以下是我启动帖子的代码。

void Main::CollectMachineFactors()
{
#ifdef WIN32
    machineFactorsThread = CreateThread(NULL, 0, CollectMachineFactorsThreadProc, this, 0, 0);
    if ( machineFactorsThread == NULL )
    {
        ERRORLOG( "Could not create thread for machine id: " << ERROR_NO << endl )
    }
#else
    int retval = pthread_create(&machineFactorsThread, NULL, CollectMachineFactorsThreadProc, this);
    if (retval)
    {
        ERRORLOG( "Return code from machine id pthread_create() is " << retval << endl )
    }
#endif
}

这是运行此多线程的简单故障情况。对于具有上面的堆栈跟踪的代码,它总是失败:

CollectMachineFactors();
EVENT_WAIT(machineFactorsEvent);
cout << machineFactors;
return 0;

起初我怀疑是图书馆问题。这是我的makefile:

# Main executable file
PROGRAM = sysinfo
# Object files
OBJECTS = Version.h Main.o Protocol.o Socket.o SSLConnection.o Stats.o TimeElapsed.o Formatter.o OSX.o Threads.o
# Include directories
INCLUDE = -Itaocrypt/include -IyaSSL/taocrypt/mySTL -IyaSSL/include -isysroot /Developer/SDKs/MacOSX10.5.sdk -mmacosx-version-min=10.5
# Library settings
STATICLIBS = libtaocrypt.a libyassl.a -Wl,-rpath,. -ldl -lpthread -lz -lexpat
# Compile settings
RELCXX = g++ -g -ggdb -DDEBUG -Wall $(INCLUDE)

.SUFFIXES:      .o .cpp

.cpp.o :
        $(RELCXX) -c -Wall $(INCLUDE) -o $@ $<

all:    $(PROGRAM)

$(PROGRAM):     $(OBJECTS)
        $(RELCXX) -o $(PROGRAM) $(OBJECTS) $(STATICLIBS)

clean: 
    rm -f *.o $(PROGRAM)

我不能为我的生活看到任何特别奇怪或危险的东西,我不知道在哪里看。相同的线程进程在我尝试的任何Linux机器上都能正常工作。有什么建议?我应该尝试哪些工具?

如果它有用,我可以添加更多信息。

2 个答案:

答案 0 :(得分:0)

我可以看到您的Windows代码存在问题,但不会发现崩溃的OSX代码。

您似乎没有发布GetMachineFactors的实际代码,因为未声明变量factors。但是关于调试,你不应该将printf输出的不出现作为该语句尚未执行的结论。使用调试器工具,例如设置断点,使用特殊的调试器跟踪输出等等(不确定gdb处理什么,它是一个非常原始的调试器,但也许Apple有更好的工具?)。

对于Windows,您应该使用运行时库的线程创建而不是Windows API CreateThread。那是因为CreateThread没有通知运行时库。例如,使用运行时库的new表达式或其他调用可能会失败。

对不起,我帮不了多忙。

我认为它可能与您未展示的GetMachineFactors代码有关?

答案 1 :(得分:0)

事实证明,我无法解释为什么fork()调用与socketpair()结合作为IPC机制是解决问题的方法。

我希望我知道为什么它首先失败了( headscratch ),但这种方法似乎是一个很好的解决方法。

这几乎看起来就像是由于在更改头文件后未能运行'make clean'而引起的“构建问题”,但这种情况并非如此。