Question

上下文：

我有一个linux [1]系统管理一系列第三方守护进程，其中交互仅限于shell [2] init脚本，即只有{start | restart | stop | status}可用。

问题：

进程可以假定先前正在运行的进程的PID，通过检查正在运行的进程是否存在PID来检查进程的状态。

示例：

处理带有PID 123的运行，随后死亡，进程B用PID 123初始化，状态命令以非真实（错误）“OK”响应。换句话说，我们只检查来自其PID的进程是否存在以验证进程是否正在运行，我们假设如果存在具有此PID的进程，则该进程是有问题的进程。

提议的解决方案：

使用PID查询进程，以确保命令/守护进程正在运行，因为该PID正如预期的那样。这个解决方案的问题是命令和PID都需要匹配;因此，需要维护多个信息位并保持同步，并增加错误/边缘条件的附加复杂性。
将PID文件的创建时间与进程的开始时间相关联，如果进程在PID文件创建时间的某个增量内，我们可以相当确定命令/守护进程是否按预期运行。 / LI>
是否有标准方法来批准进程/ PID文件的真实性，除了存在使用该PID运行的进程之外？即我（作为系统）想知道你（过程）是否正在运行，以及你是否认为你是谁（A而不是B）。

假设我们选择实施上面提出的第二个解决方案，PID创建时间和流程开始时间之间的置信区间/增量是否合理？在这里，合理意味着类型1 /类型2错误之间可接受的妥协。

[1] CentOS / RHEL [2] Bash

Answer 1

文件内容：

的/ proc / {PID} / CMDLINE

是用于启动进程的命令行。这就是你需要的吗？

Answer 2

我的解决方案是捕获命令（通过/proc/PID/cmdline）以及 relative 开始时间。使用absolute start time（通过ps -p PID -o lstart=）似乎可行，但您将获得confusing results if your system clock changes（例如，来自NTP更新或夏令时）。

这是我的实施：

# Prints enough detail to confirm a PID still refers to the same process.
# In other words, even if a PID is recycled by a call to the same process the
# output of this command should still be different. This is not guaranteed
# across reboots.
proc_detail() {
  local pid=${1:?Must specify PID}
  # the process' commandline, if it's running
  # ensures a non-existant PID will never have the same output as a running
  # process, and helps debugging
  cat "/proc/$pid/cmdline" 2> /dev/null && echo
  # this is the number of seconds after boot that the process started
  # https://unix.stackexchange.com/a/274722/19157
  # in theory this could collide if the same process were restarted in the same
  # second and assigned the same PID, but PIDs are assigned in order so this
  # seems acceptably unlikely for now.
  echo "$(($(cut -d. -f1 < /proc/uptime) - \
           $(ps -p "$pid" -o etimes= 2> /dev/null || echo "0")))"
}

我还决定将此输出存储在/dev/shm中，以便在关机时自动清除它。还有其他可行的选项（例如@reboot cronjob）但是对于我的用例，写入tmpfs很简单。

Answer 3

我一直在寻找问题的答案，如何确保一个过程仍然是同一过程，我想到了这个问题的两个解决方案，即一个进程是否可以由元组（pid，命令）或（pid，进程开始时间）唯一标识。但是遗憾的是，这两个选项似乎都不够用。

（pid，command）由于pid的重用而不能满足需要，例如，原始进程可能已经被杀死，并且pid可以自由重用，使用该命令行可以启动另一个具有相同命令行的进程pid。
（pid，进程开始时间）似乎存在启动时间有时会少量变化的问题。

现在，另一种选择是能够更改进程标题，例如，我们可以在您的进程标题中添加一个随机数，并将该随机数与pid一起存储在pidfile中。然后，当我们要检查进程是否仍然相同时（例如将其杀死），我们可以检查pid文件的pid的进程标题是否仍以pid文件中的随机数开头。

为便于说明，请考虑以下简短的python代码段，应该通过其他语言的库提供类似的功能：

#!/usr/bin/env python3
import os, setproctitle
nonce = bytes.hex(os.urandom(8))                      # create hex nonce
setproctitle.setproctitle(nonce + " " + setproctitle.getproctitle()) # set title
with open("run.pid", "w"): f.write(pid + " " + nonce) # store pid and nonce in pidfile

与该Shell脚本一起杀死进程，如果它仍然相同的话。

#!/bin/sh
PID=$(cat run.pid | cut -f1 -d" ")     # get pid from pidfile
NONCE1=$(cat run.pid | cut -f2- -d" ") # get nonce from pidfile
NONCE2="$(ps -p "$PID" -o command= 2> /dev/null | cut -f1 -d" ")" # get nonce from process title
if [ "$NONCE1" = "$NONCE2" ]; then     # if nonces equal
  kill "$PID"                          # kill process
  echo "killed"
else                                   # otherwise the process you wanted to kill
  echo "was already dead"              # has been dead anyway
fi

如何确保正在运行的进程是我期望的进程？

3 个答案: