Question

我是否可以在AIX上的C ++中使用任何AIX运行时库调用来监视与正在运行的进程相关联的线程的状态？我正在尝试解决崩溃关闭问题，我认为该问题是由程序在所有线程加入之前退出引起的。

我很欣赏在多线程环境中，准确记录线程的状态并不容易，因为它们在读取状态和显示之间可能已经发生了变化，但是任何东西 - 无论多么粗糙 - 都可以用作追踪这一点的第一步。

Answer 1

你说“崩溃关机”...你的意思是系统崩溃与崩溃转储？如果是这样，那么你有大量的数据。如果你需要，我会用一个系统跟踪系统崩溃并重新启动后，您可以使用trcdead从转储中获取跟踪缓冲区。再加上系统的状态。

不应该是愚蠢的线程导致系统崩溃。

Answer 2

首先，有一个系统跟踪工具。我没有从应用程序中使用它（任何？）但它是线程安全的。

http://pic.dhe.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.genprogc/doc/genprogc/trace_facility.htm#yg3100thri

和

http://pic.dhe.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.genprogc/doc/genprogc/tracing.htm?resultof=%22%61%70%70%6c%69%63%61%74%69%6f%6e%22%20%22%61%70%70%6c%69%63%22%20%22%74%72%61%63%65%22%20

如果这是一个非常复杂的应用程序，我会连接真正的跟踪钩子并开发一个跟踪格式文件。值得花时间。以下内容是一种较为粗俗的方法。

我可能追踪这个问题的方法是连接一个本土的跟踪或日志工具。在代码中，请调用日志例程。然后返回并检查核心文件，挖出日志缓冲区，这将告诉您所点击的日志点序列。

这可能是一个迭代过程，您可以在其中添加几个点，然后在代码的特定部分中需要更多的点并在那里添加日志点。再试一次。重复。

日志例程实际上非常简单，并利用其中一个原子操作。我正在使用fetch_and_add。

long array[4096];       /* some power of 2 in size is my preference */
unsigned int index;     /* int -- not a long */

/* trace 5 words each time. */
void log(long *a, long b, long c, long d, long e)
{
  /*
   * the 5 equals the number of args.  The 4095 is one less than the
   * size of the array.  You can use mod if you want.  Also, note that
   * there are flavors of fetch_and_add for different sized
   * variables.  Pick the one that matches the size of index.
   */
  int i = fetch_and_add(&indx, 5) & 4095; 

  /*
   * at this point, array[i] ... array[i+4] have been effectively
   * reserved.  The time taken between the fetch_and_add and updating
   * the array does not need to be atomic or locked.  The only
   * possible exception is you don't want the log to wrap within this
   * time but that would be very unlikely.
   */

  array[i] = *a;
  array[i+1] = b;
  array[i+2] = c;
  array[i+3] = d;
  array[i+4] = e;
}

/* your original code spinkle calls to log */
int whatever(long arg1, int arg2)
{
  log("WHT1", arg1, arg2, 0, 0);

  foo = apple + pie;
  bar = whisky + good;
  dog = nice + pet;
  cat = meow;

  log("WHT2", foo, bar, log, dog);

  /* ... */
}

第一个参数的技巧是，当你获得核心文件并转储数组时，可以将其作为十六进制和文本转储。从文本输出中，您可以快速查看正在调用的日志点。如果您有64位应用程序，而不是将自己限制为4个字符，则可以使用8。

请注意，index的值是核心文件中的键。这告诉你最后一个被击中的日志点。然后向后退一步，查看以前的日志点。

AIX库调用以获取线程信息/状态

2 个答案: