我在多个处理器上运行蒙特卡罗,但它挂了很多。所以我把这个perl代码放在一起来杀死挂起monte carlo的迭代并转到下一次迭代。但我得到一些错误,我还没弄明白。 我认为它睡眠时间过长,它会删除out.mt0文件,然后才能查找它。 这是代码:
my $pid = fork();
die "Could not fork\n" if not defined $pid;
if ($pid==0){
print "In child\n";
system("hspice -i mont_read.sp -o out -mt 4");wait;
sleep(.8);wait;
exit(0);
}
print "In parent \n";
$i = 0;
$mont_number = $j - 1;
out: while (1){
$res=waitpid($pid, WNOHANG);
if ($res == -1) {
print "Successful Exit Process Detected\n";
system("mv out.mt0 mont_read.mt0");wait;
sleep(1);wait;
system("perl monte_stat.pl > rel_out.txt"); wait ;
system("cat stat_result.txt rel_out.txt > stat_result.tmp"); wait;
system("mv stat_result.tmp stat_result.txt");wait;
print "\nSim #$mont_number complete\n";wait;
last out;
}
if($res != -1){
if($i>=$timeout){
$hang_count = $hang_count+1;
system("killall hspice");wait;
sleep(1);
print("time_out complete\n");wait;
print "\nSim #$mont_number complete\n";wait;
last out;
}
if($i<$timeout){
sleep $slept;wait;
}
$i=$i+1;
}
}
这是错误:
Illegal division by zero at monte_stat.pl line 73, <INHSPOUT> line 2.
mv: cannot stat `out.mt0': No such file or directory
Illegal division by zero at monte_stat.pl line 73, <INHSPOUT> line 1.
mv: cannot stat `out.mt0': No such file or directory
Illegal division by zero at monte_stat.pl line 73, <INHSPOUT> line 1.
mv: cannot stat `out.mt0': No such file or directory
Illegal division by zero at monte_stat.pl line 73.
mv: cannot stat `out.mt0': No such file or directory
Illegal division by zero at monte_stat.pl line 73.
mv: cannot stat `out.mt0': No such file or directory
mv: cannot stat `out.mt0': No such file or directory
mv: cannot stat `out.mt0': No such file or directory
Illegal division by zero at monte_stat.pl line 73, <INHSPOUT> line 3.
mv: cannot stat `out.mt0': No such file or directory
Illegal division by zero at monte_stat.pl line 73, <INHSPOUT> line 1.
mv: cannot stat `out.mt0': No such file or directory
任何人都可以告诉我在哪里调试它。 感谢
答案 0 :(得分:3)
根据错误,您的hslice
会崩溃。但还有其他问题。
这是第一个尽可能接近您的代码的工作示例。
use warnings;
use strict;
use feature 'say';
use POSIX qw(:sys_wait_h);
$| = 1;
my ($timeout, $duration, $sleep_time) = (5, 10, 1);
my $pid = fork // die "Can't fork: $!";
if ($pid == 0)
{
exec "echo JOB STARTS; sleep $duration; echo JOB DONE";
die "exec shouldn't return: $!";
}
say "Started $pid";
sleep 1;
my $tot_sec;
while (1)
{
my $ret = waitpid $pid, WNOHANG;
if ($ret > 0) { say "Child $ret exited with: $?"; last; }
elsif ($ret < 0) { say "\nNo such process ($ret)"; last; }
else { print " . " }
sleep $sleep_time;
if (($tot_sec += $sleep_time) > $timeout) {
say "\nTimeout. Send 15 (SIGTERM) signal to the process.";
kill 15, $pid;
last;
}
}
$duration
(作业)设置为3
,短于$timeout
,我们
Started 16848 JOB STARTS . . . JOB DONE Child (JOB) 16848 exited with: 0
将$duration
设置为10
时,我们得到
Started 16550 JOB STARTS . . . . . Timeout. Send 15 (SIGTERM) signal to the process.
并且工作被杀死(等待5秒钟 - JOB DONE
不应该出现)。
对问题中代码的评论
如果fork
只是为了完成工作,则没有system
的理由。只需exec该程序
在system
之后无需wait,这是错误的。 system
包括等待
wait
不属于print
和sleep
,而且错误
无需为killall
支持以杀死进程
如果您最终使用system
,程序将在另一个PID的新进程中运行。然后需要更多来找到PID并杀死它。请参阅Proc::ProcessTable和this post,例如
上面的代码需要检查进程是否确实被杀死
替换您的命令行而不是echo ...
,并根据需要添加对它的检查。
另一种选择是简单地睡眠$timeout
期,然后检查作业是否完成(退出孩子)。但是,通过您的方法,您可以在轮询时执行其他操作。
另一种选择是使用alarm。