命令行输出

Question

我有一个用于在线一致性应用程序的Perl CGI脚本，用于在文本中搜索单词的实例并打印已排序的输出。

#!/usr/bin/perl -wT

# middle.pl - a simple concordance



# require
use strict;
use diagnostics;
use CGI;


# ensure all fatals go to browser during debugging and set-up
# comment this BEGIN block out on production code for security
BEGIN {
    $|=1;
    print "Content-type: text/html\n\n";
    use CGI::Carp('fatalsToBrowser');
}

# sanity check
my $q = new CGI;
my $target = $q->param("keyword");
my $radius = $q->param("span");
my $ordinal = $q->param("ord");
my $width = 2*$radius;
my $file    = 'concordanceText.txt';
if ( ! $file or ! $target ) {

    print "Usage: $0 <file> <target>\n";
    exit;

}

# initialize
my $count   = 0;
my @lines   = ();
$/          = ""; # Paragraph read mode

# open the file, and process each line in it
open(FILE, " < $file") or die("Can not open $file ($!).\n");
while(<FILE>){

    # re-initialize
    my $extract = '';

    # normalize the data
    chomp;
    s/\n/ /g;        # Replace new lines with spaces
    s/\b--\b/ -- /g; # Add spaces around dashes

    # process each item if the target is found
    while ( $_ =~ /\b$target\b/gi ){

        # find start position
        my $match = $1;
        my $pos   = pos;
        my $start = $pos - $radius - length($match);

        # extract the snippets
        if ($start < 0){
            $extract = substr($_, 0, $width+$start+length($match));
            $extract = (" " x -$start) . $extract;
        }else{
            $extract = substr($_, $start, $width+length($match));
            my $deficit = $width+length($match) - length($extract);
            if ($deficit > 0) {
                $extract .= (" " x $deficit);
            }

        }

        # add the extracted text to the list of lines, and increment
        $lines[$count] = $extract;
        ++$count;

    }

}

sub removePunctuation {
    my $string = $_[0];
    $string = lc($string); # Convert to lowercase
    $string =~ s/[^-a-z ]//g; # Remove non-aplhabetic characters 
    $string =~ s/--+/ /g; #Remove 2+ hyphens with a space 
    $string =~s/-//g; # Remove hyphens
    $string =~ s/\s=/ /g;
    return($string);

}

sub onLeft {
    #USAGE: $word = onLeft($string, $radius, $ordinal);
    my $left = substr($_[0], 0, $_[1]);
    $left = removePunctuation($left);
    my @word = split(/\s+/, $left);
    return($word[-$_[2]]);
}

sub byLeftWords {
    my $left_a = onLeft($a, $radius, $ordinal);
    my $left_b = onLeft($b, $radius, $ordinal);
    lc($left_a) cmp lc($left_b);
}


# process each line in the list of lines

print "Content-type: text/plain\n\n";
my $line_number = 0;
foreach my $x (sort byLeftWords @lines){
    ++$line_number;
    printf "%5d",$line_number;
    print " $x\n\n";
}

# done
exit;

perl脚本在终端（命令行）中产生预期结果。但是在线应用程序的CGI脚本会产生意外的输出。我无法弄清楚我在CGI脚本中犯了什么错误。理想情况下，CGI脚本应该生成与命令行脚本相同的输出。任何建议都会非常有用。

命令行输出

CGI输出

Answer 1

BEGIN块在其他任何内容之前执行，因此在

之前执行

my $q = new CGI;

输出进入服务器进程＆＃39; stdout而不是HTTP流，因此您可以在CGI输出中看到默认值text/plain。

在你解决了这个问题后，你会发现输出看起来仍然像一个很难看的块，因为你需要格式化并发送一个有效的HTML页面，而不仅仅是一大块文本。你不能只是将一堆文本转储到浏览器中，并期望它能够用它做任何聪明的事情。您必须创建一个带有标签的完整HTML页面来布局您的内容，也可能使用CSS。

换句话说，当仅写入终端时，所需的输出将与输出完全不同。如何构建它取决于你，并解释如何做到这一点超出了StackOverflow的范围。

Answer 2

正如其他答案所述，BEGIN块在程序的最开始执行。

BEGIN {
    $|=1;
    print "Content-type: text/html\n\n";
    use CGI::Carp('fatalsToBrowser');
}

在那里，您输出HTTP标头Content-type: text/html\n\n。浏览器首先看到，并将所有输出视为HTML。但你只有文字。 HTML页面中的空格会折叠为单个空格，因此所有\n换行符都会消失。

稍后，您打印另一个标题，浏览器无法再将其视为标题，因为您已经有一个标题，并使用两个换行符\n\n完成了它。现在已经来不及切换回text/plain。

让CGI程序返回text/plain完全没问题，只需要在浏览器中显示没有标记的文本，只需要文本，没有颜色或链接或表格。对于某些用例，这很有意义，即使它不再具有超文本中的 hyper 。但你并没有真正这样做。

您的BEGIN阻止了一个目的，但是你做得太过分了。您正在尝试确保在发生错误时，它会在浏览器中很好地打印出来，因此您不需要在开发时处理服务器日志。

CGI::Carp模块及其functionality fatalsToBrowser为此带来了自己的机制。你不必自己做。

您可以安全地删除BEGIN块，只需将use CGI::CARP放在脚本的顶部，并附上所有其他use语句。无论如何，它们都先运行，因为use在编译时运行，而其余代码在运行时运行。

如果您愿意，可以保留$|++，这会关闭STDOUT句柄的缓冲。它会立即刷新，每次打印时，输出都直接进入浏览器而不是收集，直到它足够或有换行符。如果您的流程运行了很长时间，这会让用户更容易看到正在发生的事情，这在生产中也很有用。

程序的顶部现在应该是这样的。

#!/usr/bin/perl -T

# middle.pl - a simple concordance
use strict;
use warnigns;
use diagnostics;
use CGI;
use CGI::Carp('fatalsToBrowser');

$|=1;

my $q = CGI->new;

最后，我从那里删除了其他部分的一些简单的话。

您的评论要求超过use声明会产生误导。这些是use，而不是require。如上所述，use在编译时运行。另一方面，require在运行时运行，可以有条件地完成。误导性评论将使其他人（或您）更难以在以后维护您的代码。
我从shebang（-w）中删除了#!/usr/bin/perl标记并将use warnings pragma放入。这是一种更现代化的方式来启用警告，因为有时候shebang可以被忽略。
The use diagnostics pragma会给你额外的解释。这很有用，but also extra slow。您可以在开发过程中使用它，但请将其删除以进行生产。
评论完整性检查应在CGI实例化下移动。
请使用new的调用形式来实例化CGI和其他任何类。 ->语法将正确处理继承，而旧new CGI则无法执行此操作。

Answer 3

我跑了你的cgi。无论如何都会运行BEGIN块，并且您在此处打印内容类型标题 - 您已在此明确要求HTML。然后您尝试为PLAIN打印另一个标题。这就是为什么你可以在浏览器窗口的文本开头看到标题文本（没有生效）。

Perl CGI产生意想不到的输出

命令行输出

CGI输出

3 个答案: