首先,请允许我描述我的情景:
我在Linux上开发了一个监督程序,它在子进程中分叉然后使用execv()来启动我的多线程应用程序。监督程序充当多线程应用程序的监视器。如果多线程应用程序在一段时间后没有向主管发送SIGUSR1信号,那么监督程序将使用fork()调用中的pid_t终止子进程并再次重复该进程。
以下是监督计划的代码:
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <iostream>
#include <cerrno>
time_t heartbeatTime;
void signalHandler(int sigNum)
{
//std::cout << "Signal (" << sigNum << ") received.\n";
time(&heartbeatTime);
}
int main(int argc, char *argv[])
{
pid_t cpid, ppid;
int result = 0;
bool programLaunched = false;
time_t now;
double timeDiff;
int error;
char ParentID[25];
char *myArgv[2];
// Get the Parent Process ID
ppid = ::getpid();
// Initialize the Child Process ID
cpid = 0;
// Copy the PID into the char array
sprintf(ParentID, "%i", ppid);
// Set up the array to pass to the Program
myArgv[0] = ParentID;
myArgv[1] = 0;
// Print out of the P PID
std::cout << "Parent ID: " << myArgv[0] << "\n";
// Register for the SIGUSR1 signal
signal(SIGUSR1, signalHandler);
// Register the SIGCHLD so the children processes exit fully
signal(SIGCHLD, SIG_IGN);
// Initialize the Heart Beat time
time(&heartbeatTime);
// Loop forever and ever, amen.
while (1)
{
// Check to see if the program has been launched
if (programLaunched == false)
{
std::cout << "Forking the process\n";
// Fork the process to launch the application
cpid = fork();
std::cout << "Child PID: " << cpid << "\n";
}
// Check if the fork was successful
if (cpid < 0)
{
std::cout << "Error in forking.\n";
// Error in forking
programLaunched = false;
}
else if (cpid == 0)
{
// Check if we need to launch the application
if (programLaunched == false)
{
// Send a message to the output
std::cout << "Launching Application...\n";
// Launch the Application
result = execv("./MyApp", myArgv);
std::cout << "execv result = " << result << "\n";
// Check if the program launched has failed
if (result != -1)
{
// Indicate the program has been launched
programLaunched = true;
// Exit the child process
return 0;
}
else
{
std::cout << "Child process terminated; bad execv\n";
// Flag that the program has not been launched
programLaunched = false;
// Exit the child process
return -1;
}
}
}
// In the Parent Process
else
{
// Get the current time
time(&now);
// Get the time difference between the program heartbeat time and current time
timeDiff = difftime(now, heartbeatTime);
// Check if we need to restart our application
if ((timeDiff > 60) && (programLaunched == true))
{
std::cout << "Killing the application\n";
// Kill the child process
kill(cpid, SIGINT);
// Indicate that the process was ended
programLaunched = false;
// Reset the Heart Beat time
time(&heartbeatTime);
return -1;
}
// Check to see if the child application is running
if (kill(cpid, 0) == -1)
{
// Get the Error
error = errno;
// Check if the process is running
if (error == ESRCH)
{
std::cout << "Process is not running; start it.\n";
// Process is not running.
programLaunched = false;
return -1;
}
}
else
{
// Child process is running
programLaunched = true;
}
}
// Give the process some time off.
sleep(5);
}
return 0;
}
这种方法运行得相当好,直到我遇到了我正在使用的库的问题。它并不像所有的杀戮一样,它基本上最终将我的以太网端口捆绑在一个永不释放的无限循环中 - 不好。
然后我尝试了另一种方法。我修改了监督程序,允许它退出,如果它必须杀死多线程应用程序,我创建了一个脚本,将从crontab启动管理程序。我使用了在Stackoverflow上找到的shell脚本。
#!/bin/bash
#make-run.sh
#make sure a process is always running.
export DISPLAY=:0 #needed if you are running a simple gui app.
process=YourProcessName
makerun="/usr/bin/program"
if ps ax | grep -v grep | grep $process > /dev/null
then
exit
else
$makerun &
fi
exit
我将它添加到crontab以便每分钟运行一次。这非常有用,它重新启动了监督程序,后者又重新启动了多线程应用程序,但我注意到多线程应用程序的多个实例正在启动的问题。我不确定为什么会这样。
我知道我真的很讨厌这个问题,但是我已经在这个实施中加入了一个角落。我只是想让它发挥作用。
建议?