超慢/ usr / bin / env调用

时间:2012-12-17 08:52:45

标签: linux cluster-computing

我正在研究一个计算集群,我有一个非常奇怪的/ usr / bin / env行为......总之,它的工作速度非常慢。 在头节点上:

$ time /usr/bin/env which
<which output>

real    0m0.025s
user    0m0.001s
sys     0m0.001s

在计算节点上:

$ qsub -I                                                                                                                
qsub: waiting for job 176620.scyld.localdomain to start
qsub: job 176620.scyld.localdomain ready

-bash-3.2$ time which
<which output>

real    0m0.003s
user    0m0.000s
sys     0m0.003s

-bash-3.2$ time /usr/bin/env /usr/bin/which

<which output>
real    0m0.003s
user    0m0.000s
sys     0m0.003s


-bash-3.2$ time /usr/bin/env which
<which output>

real    5m0.003s
user    0m0.001s
sys     0m0.001s

ps ax 报告此:

12884 pts/3    S+     0:00 /usr/bin/env which

打印使用横幅需要5分钟。任何想法为什么会发生这种情况?

修改1:

有关其中的其他信息:

-bash-3.2$ type -a which
which is /usr/bin/which
-bash-3.2$ file /usr/bin/which
/usr/bin/which: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), stripped
-bash-3.2$ echo $PATH
/bin:/usr/bin:/home/gusev/.rvm/bin:/home/gusev/bin

修改2

strace'd /usr/bin/env which而且它被卡在了

execve("/bin/which", ["which"], [/* 47 vars */]

现在正在运行一个简单的

/bin/which

也卡住了,但这个文件不存在:

-bash-3.2$ ls /bin/which
ls: /bin/which: No such file or directory

/bin挂载在NFS上:

-bash-3.2$ mount | grep bin
10.54.0.1:/bin on /bin type nfs (nolock,nonfatal)
10.54.0.1:/usr/bin on /usr/bin type nfs (nolock,nonfatal)

所以这可能是一个网络问题......

编辑3:

which which完美无缺:

-bash-3.2$ time which which
/usr/bin/which

real    0m0.002s
user    0m0.000s
sys     0m0.002s

strace -e trace=execve /usr/bin/env which的输出是

execve("/usr/bin/env", ["/usr/bin/env", "which"], [/* 47 vars */]) = 0
execve("/bin/which", ["which"], [/* 47 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/bin/which", ["which"], [/* 47 vars */]) = 0
<which output>

编辑4:

挂起时间总是5分钟。看起来它是某种默认值超时。

3 个答案:

答案 0 :(得分:0)

可能是导致问题的是which命令,而不是env命令。

因为你看到的结果非常不同

time /usr/bin/env /usr/bin/which

VS

time /usr/bin/env which

您的which可能还有另一个$PATH命令,可能在/usr/local/bin$HOME/bintype -a which告诉你什么?你的$PATH看起来像什么?

请注意which可以是shell脚本或可执行文件。如果它是一个shell脚本,请尝试抓取它的副本并添加set -x以查看它正在做什么。

答案 1 :(得分:0)

此问题以及your previous question中描述的问题似乎是execve需要很长时间才能返回计算机笔记引起的。路径中的dirs是NFS挂载的事实可能是一个促成因素。

通过strace运行命令,我们看到env使用对execve的重复调用来探测每条路径中是否存在命令:

[me@home]$ echo $PATH
/home/me/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/me/work/bin

[me@home]$ strace -e execve /usr/bin/env which
execve("/usr/bin/env", ["/usr/bin/env", "which"], [/* 53 vars */]) = 0
execve("/home/me/bin/which", ["which"], [/* 53 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/lib/lightdm/lightdm/which", ["which"], [/* 53 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/local/sbin/which", ["which"], [/* 53 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/local/bin/which", ["which"], [/* 53 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/sbin/which", ["which"], [/* 53 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/bin/which", ["which"], [/* 53 vars */]) = 0

正如您在上述评论中所确认的那样,which which不会遇到同样的问题,因为它使用stat代替execve来探测路径:

[me@home]$ strace -e execve,stat /usr/bin/which which
execve("/usr/bin/which", ["/usr/bin/which", "which"], [/* 53 vars */]) = 0
stat("/home/me", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/me/bin/which", 0x7fff79ae8760) = -1 ENOENT (No such file or directory)
stat("/usr/lib/lightdm/lightdm/which", 0x7fff79ae8760) = -1 ENOENT (No such file or directory)
stat("/usr/local/sbin/which", 0x7fff79ae8760) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/which", 0x7fff79ae8760) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/which", 0x7fff79ae8760) = -1 ENOENT (No such file or directory)
stat("/usr/bin/which", {st_mode=S_IFREG|0755, st_size=946, ...}) = 0
/usr/bin/which

我担心无法提出解决潜在问题的任何建议,但在同一时间你可以通过以下方式解决问题:

  1. 使用命令的完整路径,而不是让env为您解决这些问题。
  2. 如果您真的希望使用env,请尽可能重新排序$PATH以最小化搜索。 E.g:

    PATH=/usr/bin:$PATH /usr/bin/env which   # place most likely path first
    

答案 2 :(得分:0)

最后,我发现我有一个很长的PATH环境变量。并且可能它以某种方式影响了调用NFS共享的execve

所以我将一堆可执行文件移动到了一个signle目录中,并用{1}替换了PATH中的许多条目。从那以后,我没有遇到任何问题。