Question

通常在NFS客户端上，如何使用Bash Shell脚本检测 Mounted-Point不再可用或者来自Server-end的DEAD ？

通常我这样做：

if ls '/var/data' 2>&1 | grep 'Stale file handle';
then
   echo "failing";
else
   echo "ok";
fi

但问题是，当特别是NFS服务器完全死机或停止时，即使是ls命令，进入该目录，在客户端也会被绞死或死亡。意思是，上面的脚本不再可用了。

有没有办法再次检测到这个？

Answer 1

“stat”命令是一种更简洁的方式：

statresult=`stat /my/mountpoint 2>&1 | grep -i "stale"`
if [ "${statresult}" != "" ]; then
  #result not empty: mountpoint is stale; remove it
  umount -f /my/mountpoint
fi

此外，您可以使用rpcinfo来检测远程nfs共享是否可用：

rpcinfo -t remote.system.net nfs > /dev/null 2>&1
if [ $? -eq 0 ]; then
  echo Remote NFS share available.
fi

已添加2013-07-15T14：31：18-05：00：

我进一步调查了这一点，因为我还在研究需要识别过时挂载点的脚本。灵感来自one of the replies“有一种很好的方法可以检测过时的NFS挂载”，我认为以下可能是检查bash中特定挂载点陈旧性的最可靠方法：

read -t1 < <(stat -t "/my/mountpoint")
if [ $? -eq 1 ]; then
   echo NFS mount stale. Removing... 
   umount -f -l /my/mountpoint
fi

如果stat命令由于某种原因而挂起，那么“-t1”构造可以有效地超出子shell。

已添加2013-07-17T12：03：23-05：00：

尽管read -t1 < <(stat -t "/my/mountpoint")有效，但是当挂载点过时时，似乎没有办法将其错误输出静音。在子shell中或命令行末尾添加> /dev/null 2>&1会破坏它。使用简单的测试：if [ -d /path/to/mountpoint ] ; then ... fi也有效，在脚本中可能更好。经过大量测试后，我才最终使用它。

已添加2013-07-19T13：51：27-05：00：

对我的问题“How can I use read timeouts with stat?”的回复提供了有关在目标不可用时静音stat（或rpcinfo）输出的其他详细信息，并且命令会在它自己超时之前挂起几分钟。虽然[ -d /some/mountpoint ]可用于检测陈旧的挂载点，但rpcinfo没有类似的替代方法，因此使用read -t1重定向是最佳选择。子shell的输出可以用 2＆gt;＆amp; - 静音。以下是CodeMonkey's response的示例：

mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [[ -n "$REPLY" ]]; then
  echo "NFS mount stale. Removing..."
  umount -f -l "$mountpoint"
fi

也许现在这个问题得到了充分的回答:)。

Answer 2

Ville和CodeMonkey给出的最终答案几乎是正确的。我不确定没有人注意到这一点，但是一个$ REPLY字符串内容是成功的，而不是失败。因此，空 $ REPLY字符串表示挂载过时。因此，条件应该使用-z，而不是-n：

mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [ -z "$REPLY" ] ; then
  echo "NFS mount stale. Removing..."
  umount -f -l "$mountpoint"
fi

我使用有效且无效的挂载点多次运行它并且它可以工作。 -n检查给了我相反的结果，当它完全有效时，回显挂载是陈旧的。

此外，简单的字符串检查不需要双括号。

Answer 3

使用“ -z”可以使NFS失效，但这完全是错误的，我可以访问它并读写文件

Linux Shell脚本：如何检测NFS挂载点（或服务器）已经死了？

3 个答案: