Unix通用子进程错误检测

时间:2016-07-27 20:42:24

标签: c++ linux unix process

我的程序(C ++)需要能够在Unix系统上从用户输入的命令创建和销毁许多单独的子进程。子流程通过管道与我的程序通信,如果它们在一定时间内没有响应,则设计为超时。因为我们将在Linux服务器上运行程序可能花费数千美元,如果进程出错或退出我们想停止等待它并确保它已经死了而不是等待它超时(可以打开)分钟的顺序,因此是宝贵的服务器时间),然后杀死它。

当使用直接编译的可执行文件(“cd path / to / directory; ./program”)运行时,下面的代码运行完全正常,停止失败的程序并使程序保持运行直到它们超时,但是当给出时要运行的python或java程序(以“cd path / to / file; python3 program.py”或“cd path / to / directory; java program”的形式)得出结论,无论其实际状态如何,子进程都已退出或失败。我该如何解决这个问题?

---创建子流程---

void Networking::startAndConnectBot(std::string command) {
pid_t pid = (pid_t)NULL;
int writePipe[2];
int readPipe[2];

if(pipe(writePipe) || pipe(readPipe)) {
    throw 1;
}

//Fork a child process
pid = fork();
if(pid == 0) { //This is the child
    setpgid(getpid(), getpid());

    dup2(writePipe[0], STDIN_FILENO);

    dup2(readPipe[1], STDOUT_FILENO);
    dup2(readPipe[1], STDERR_FILENO);

    execl("/bin/sh", "sh", "-c", command.c_str(), (char*) NULL);

    //Nothing past the execl should be run

    exit(1);
} else if(pid < 0) {
    if(!quiet_output) std::cout << "Fork failed\n";
    throw 1;
}

UniConnection connection;
connection.read = readPipe[0];
connection.write = writePipe[1];

connections.push_back(connection);
processes.push_back(pid);

---从子流程中获取响应---

std::string Networking::getString(unsigned char playerTag, unsigned int timeoutMillis) {

std::string newString;

UniConnection connection = connections[playerTag - 1];

fd_set set;
FD_ZERO(&set); /* clear the set */
FD_SET(connection.read, &set); /* add our file descriptor to the set */

struct timeval timeout; //We want it to be non-blocking, so we'll check every ten millisecond to get a result.
timeout.tv_sec = 0;
timeout.tv_usec = 10000;

char buffer;

//The time we started at.
std::chrono::high_resolution_clock::time_point initialTime = std::chrono::high_resolution_clock::now();

//Keep reading char by char until a newline
bool shouldContinue = true;
while(shouldContinue) {
    //Check if process is dead.
    int status;
    if(waitpid(processes[playerTag - 1], &status, WNOHANG) == processes[playerTag - 1] ||
    std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - initialTime).count() >= timeoutMillis) {
        killPlayer(playerTag);
        if(!quiet_output) {
            // Buffer error message output
            // If a bunch of bots fail at onces, we dont want to be writing to cout at the same time
            // That looks really weird
            std::string errorMessage = "";
            errorMessage += std::string("Unix bot timed out or errored.\n");

            playerLogs[playerTag-1].push_back(newString);
            errorMessage += "#---------ALL OF THE OUTPUT OF THE BOT THAT TIMED OUT----------#\n";
            for(auto stringIter = playerLogs[playerTag-1].begin(); stringIter != playerLogs[playerTag-1].end(); stringIter++) {
                while(stringIter->size() < 60) stringIter->push_back(' ');
                errorMessage += "# " + *stringIter + " #\n";
            }
            errorMessage += "#--------------------------------------------------------------#\n";

            std::lock_guard<std::mutex> guard(coutMutex);
            std::cout << errorMessage;
        }
        throw 1;
    }

    //Check if there are bytes in the pipe
    for(int selectionResult = select(connection.read+1, &set, NULL, NULL, &timeout); selectionResult > 0; selectionResult--) {
        read(connection.read, &buffer, 1);

        if(buffer == '\n') {
            shouldContinue = false;
            break;
        }
        else newString += buffer;
    }

    //Reset timeout - we should consider it to be undefined.
    timeout.tv_sec = 0;
    timeout.tv_usec = 10000;
}

也就是说,无论何时使用程序运行上述java或python命令,都会从getString抛出1,无论它们是否实际退出或出错。

非常感谢任何帮助。

0 个答案:

没有答案