Question

我使用proc_open将一些文本传输到perl脚本以加快处理速度。该文本包括url编码的字符串以及文字空格。当原始文本中出现url编码的空格时，它似乎在到达perl脚本时被解码为文字空间。在perl脚本中，我依赖于文字空间的定位，所以这些不需要的空间会弄乱我的输出。

为什么会发生这种情况，有没有办法阻止它发生？

相关代码段：

$descriptorspec = array(
    0 => array("pipe", "r"),
    1 => array("pipe", "w"),
);
$cmd = "perl script.pl";
$process = proc_open($cmd, $descriptorspec, $pipes);
$output = "";

if (is_resource($process)) {
    fwrite($pipes[0], $raw_string);
    fclose($pipes[0]);
    while (!feof($pipes[1])) {
        $output .= fgets($pipes[1]);
    }
    fclose($pipes[1]);
    proc_close($process);
}

并且一行原始文本输入看起来像这样：

key url\tvalue1\tvalue2\tvalue3

我可以通过转换输入格式来避免这个问题，但出于各种原因这是不可取的，并且绕过而不是解决，这是关键问题。

此外，我知道问题发生在php脚本和perl脚本之间，因为我在将它写入perl脚本STDIN管道之前检查了原始文本（带有echo），我已经在url编码的原始字符串上直接测试了我的perl脚本。

我现在添加了下面的perl脚本。它基本上归结为一个迷你地图减少工作。

use strict;

my %rows;
while(<STDIN>) {
    chomp;
    my @line = split(/\t/);
    my $key = $line[0];
    if (defined @rows{$key}) {
        for my $i (1..$#line) {
            $rows{$key}->[$i-1] += $line[$i];
        }
    } else {
        my @new_row;
        for my $i (1..$#line) {
            push(@new_row, $line[$i]);
        }
        $rows{$key} = [ @new_row ];
    }
}

my %newrows;
for my $key (keys %rows) {
    my @temparray = split(/ /, $key);
    pop(@temparray);
    my $newkey = join(" ", @temparray);
    if (defined @newrows{$newkey}) {
        for my $i (0..$#{ $rows{$key}}) {
            $newrows{$newkey}->[$i] += $rows{$key}->[$i] > 0 ? 1 : 0;
        }
    } else {
        my @new_row;
        for my $i (0..$#{ $rows{$key}}) {
            push(@new_row, $rows{$key}->[$i] > 0 ? 1 : 0);
        }
        $newrows{$newkey} = [ @new_row ];
    }
}

for my $key (keys %newrows) {
    print "$key\t", join("\t", @{ $newrows{$key} }), "\n";
}

Answer 1

自我注意：始终检查您的假设。事实证明，在我的数亿行输入中的某个地方实际上存在应该有url编码空间的文字空间。找到它们需要一段时间，因为有数亿个正确的文字空间，但它们就是。

对不起伙计们！

php - 管道输入到perl进程自动解码url编码的字符串

1 个答案: