改进我的Perl算法以合并postscript show命令

时间:2012-11-22 16:45:50

标签: perl matlab postscript

Matlab R2007b的Postscript输出值得怀疑。我发现文本字符串在postscript输出(simprintdiag)中的许多“moveto”和“show”命令中被分开。这在排版到PDF时会导致问题,因为额外的空格有时会插入到标签中(因此您无法双击它们,也无法在搜索中找到它们。)。

为了避免这个问题,我编写了一个Perl脚本,将这些拆分'show'命令重新加入,然而,它有一些问题,我需要一些帮助。

  1. 显示“(0)s”之类的命令不能正确重复,并显示在下一个块中。
  2. 即使不需要进行任何更改,脚本也会始终修改输入的postscript文件。
  3. 在开始时有一个黑客可以绕过连续的show命令。
  4. 它不是很快,并且鉴于一些项目有> 2000 postscript文件,欢迎任何速度改进。
  5. 下面我的代码中的DATA有四个mt和s命令中的分割文本字符串示例。我已经包含了最终输出应该在最后的结果。该脚本使用的事实是我们的文本是从左到右,或在postscript中写的,带有移动的X线和固定的Y线。因此,得出结论,具有相同Y字符串的连续mt命令是相同的文本字符串。

    感激不尽的任何帮助。

    谢谢:)

    我的Perl脚本:

    use strict;
    use warnings;
    
    my $debug=1;
    
    #
    ## Slurp the input file into a variable
    my $ps_in;
    while(<DATA>) {
       $ps_in .= $_;     # Take a copy of input file
    }
    
    
    #
    ## HACK
    ## The main PS fix algorithm only works with show commands on a single
    ## line!  Fix the input contents now by joining all show commands that 
    ## occur over multiple lines.  Examples of this are:
    ##  272   63 mt 
    ## (main is an externally linked function of the ACC feature ru\
    ## nning every ) s
    ##  991   63 mt
    ## (100) s
    my $buf;
    my $no_show_split;
    open(my $fh_ps, "<", \$ps_in );
    while(<$fh_ps>) {
       if( /^(.*)\\$/ ) {   # Match on all lines ending with backslash \
          $buf .= $1;
       }
       else {
          if( $buf ) {
             $no_show_split .= $buf;
             undef($buf);
          }
          $no_show_split .= $_;
       }
    }
    close $fh_ps;
    
    #
    ## Reopen our ps input, now the show splits have been removed
    open($fh_ps,"<",\$no_show_split );
    
    my $moveto_line = qr/^\s*\d+\s+(\d+)\s+(mt|moveto)/;  # Example '2831  738 mt'
    my $show_line   = qr/^\((.+)\)\s+(s|show)/;           # Example '(chris) s'
    my $ycrd;      # Y-axis cords
    my $pstxt;     # Text to display
    my $mtl;       # Moveto line
    my $print_text;
    my $fixes=0;
    my $ps_condensed;
    
    while(<$fh_ps>) {
    
        if( $print_text ) {
            $ps_condensed .= "$mtl\n";
            $ps_condensed .= "($pstxt) s\n";
            print "($pstxt) s\n====================\n" if $debug;
            undef($ycrd);
            undef($pstxt);
            $print_text=0;
            ++$fixes;
        }
    
        if( /$moveto_line/ ) {
            chomp;
    
            if( !$ycrd ) {
                $mtl=$_;       # Store this line for print later
                $ycrd=$1;      # Match on y-axis value
                redo;          # Redo this iteration so we can read the show line in
            }
            elsif( $1 == $ycrd ) {
                <$fh_ps> =~ /$show_line/;  # Read in the show line
                $pstxt .= $1;              # Built up string we want
                print " $mtl -->$1<--\n" if $debug;
            }
            else {
                $print_text=1; # Dropped out matching on y-cord so force a print
                redo;          # Need to redo this line again
            }
        }
        else {
            if( $pstxt ) {     # Print if we have something in buffer
                $print_text=1;
                redo;
            }
            $ps_condensed .= $_;
        }
    
    } # End While Loop
    close $fh_ps;
    
    print $ps_condensed;
    
    
    __DATA__
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 60 FMSR
    
    11214 11653 mt 
    (0) s
    4.5 w
    156 0 2204 19229 2 MP stroke
    156 0 2204 19084 2 MP stroke
    
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 120 FMSR
    
    8913 14971 mt 
    (Function) s
    9405 14971 mt 
    (-) s
    9441 14971 mt 
    (Call) s
    9009 15127 mt 
    (Generator) s
    6 w
    
    
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 120 FMSR
    
    4962 4747 mt 
    (trigger) s
    5322 4747 mt 
    (_) s
    5394 4747 mt 
    (scheduler) s
    5934 4747 mt 
    (_) s
    6006 4747 mt 
    (100) s
    6222 4747 mt 
    (ms) s
    6378 4747 mt 
    (_) s
    6450 4747 mt 
    (task) s
    6654 4747 mt 
    (_) s
    6726 4747 mt 
    (06) s
    6 w
    gr
    
    24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 120 FMSR
    
    5454 11947 mt 
    (Chris_\
    did_this_example_) s
    5874 11947 mt 
    (to_test) s
    5946 11947 mt 
    (_out) s
    6 w
    

    最终的“精简”后记应该是什么样的:

    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 60 FMSR
    
    11214 11653 mt 
    (0) s
    4.5 w
    156 0 2204 19229 2 MP stroke
    156 0 2204 19084 2 MP stroke
    
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 120 FMSR
    
    8913 14971 mt 
    (Function-Call) s
    9009 15127 mt 
    (Generator) s
    6 w
    
    
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 120 FMSR
    
    4962 4747 mt 
    (trigger_scheduler_100ms_task_06) s
    6 w
    gr
    
    24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
    %%IncludeResource: font Helvetica
    /Helvetica /WindowsLatin1Encoding 120 FMSR
    
    5454 11947 mt 
    (Chris_did_this_example_to_test_out) s
    6 w
    

2 个答案:

答案 0 :(得分:2)

我认为以下内容适合您。

注意:

  • 使用成语:do { local $/; <DATA> };
  • 覆盖所有数据
  • 使用单个正则表达式
  • 修复行尾的反斜杠

use strict;
use warnings;

my $data = do { local $/; <DATA> };
$data =~ s,\\\n,,g;

my $out = "";
my $s = "";    
my $y;

for my $line (split("\n", $data)) {
  if (defined($y) && $line =~ m/^\((.*)\)\s+s\s*$/) {
    $s .= $1;
    next;
  } elsif ($line =~ m/^(\d+)\s+(\d+)\s+mt\s*$/) {
    if (defined($y) && $y == $2) {
      next;
    } else {
      $y = $2;
    }
  } else {
    $y = undef;
  }
  if (length($s)) {
    $out .= "($s) s\n";
    $s = "";
  }
  $out .= "$line\n";
}

print $out;

答案 1 :(得分:1)

我没有看到这方面的一般方法。 但一系列特殊情况似乎有效。这里的弱点是增加越来越多的特殊情况并不是一个可以很好地扩展的模型。但如果这是问题的完整列表,那么这应该有效。

#!/usr/bin/perl -Tw

use strict;
use warnings;

my %regex_for = (
    a => qr{
        \( ( \w+ ) \)     \s s  \s+  # (Function) s
        \d+ \s+ \d+       \s mt \s+  # 9405 14971 mt
        \( ( [-_]|ms ) \) \s s  \s+  # (-) s
        \d+ \s+ \d+       \s mt \s+  # 9441 14971 mt
        \( ( \w+ ) \)     \s s  \s+  # (Call) s
    }xmsi,
    b => qr{
        \( ( \w+ ) \\ \s* ( \w+ ) \)  # (Chris_\
    }xms,    #  did_this_example_)
    c => qr{
        \( ( \w+ _ ) \) \s s  \s+  # (Chris_did_this_example_) s
        \d+ \s+ \d+     \s mt \s+  # 5874 11947 mt
        \( ( \w+ ) \)   \s s  \s+  # (to_test) s
    }xms,
    d => qr{
        \( ( \w+ ) \)   \s s  \s+  # (to_test) s
        \d+ \s+ \d+     \s mt \s+  # 5946 11947 mt
        \( ( _ \w+ ) \) \s s  \s+  # (_out) s
    }xms,
);

my $ps = do { local $/; <DATA> };

REGSUB:
{
    my $a = $ps =~ s{ $regex_for{a} }{($1$2$3) s\n}xmsg;
    my $b = $ps =~ s{ $regex_for{b} }{($1$2)}xmsg;
    my $c = $ps =~ s{ $regex_for{c} }{($1$2) s\n}xmsg;
    my $d = $ps =~ s{ $regex_for{d} }{($1$2) s\n}xmsg;

    redo REGSUB
        if $a || $b || $c || $d;
}

print $ps;

__DATA__
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR

11214 11653 mt
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

8913 14971 mt
(Function) s
9405 14971 mt
(-) s
9441 14971 mt
(Call) s
9009 15127 mt
(Generator) s
6 w


%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

4962 4747 mt
(trigger) s
5322 4747 mt
(_) s
5394 4747 mt
(scheduler) s
5934 4747 mt
(_) s
6006 4747 mt
(100) s
6222 4747 mt
(ms) s
6378 4747 mt
(_) s
6450 4747 mt
(task) s
6654 4747 mt
(_) s
6726 4747 mt
(06) s
6 w
gr

24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

5454 11947 mt
(Chris_\
did_this_example_) s
5874 11947 mt
(to_test) s
5946 11947 mt
(_out) s
6 w