Perl更改括号内的文本文件

时间:2013-02-18 17:31:13

标签: perl parsing matching

希望你们可以提供帮助,我会承认我是Perl的新用户并做了一些搜索,但实际上并不理解那里的选项。希望你们能够更好地向我解释并帮助我开始。所以我已经能够让Perl打开文本文件,将其读出到一个数组,然后编写新文件就好了。以下是我的文本文件的缩写示例:

Composition {
  CurrentTime = 0,
  OutputClips = {
    "",
  },
  Tools = {
      Text3 = Text3D {
          NameSet = true,
          Inputs  = {
            Size       = Input { Value = 0.6282723, },
            Font       = Input { Value = "KG Shadow of the Day", },
            StyledText = Input { Value = "Your Text Goes Here 3", },
      },
      ShadowDensity = BrightnessContrast {
          NameSet = true,
          Inputs  = {
            Alpha = Input { Value = 1, },
            Gain  = Input { Value = 0.5, },
            Input = Input {
                SourceOp = "Loader2",
                Source   = "Output",
            },
          },
          ViewInfo = OperatorInfo { Pos = { -220, 82.5, }, },
      },
  },
}

我需要能够更改Text3 'StyledText = Input'中的值以及ShadowDensity 'Alpha = Input'值。而且我不能只做一个正常的表达式来查找'Alpha = Input',因为在数组中有其他嵌套项,它们在不同的工具下具有相同的名称。与文本部分相同如果我有多个文本工具,它将找不到正确的。欢迎任何帮助和建议。感谢

2 个答案:

答案 0 :(得分:1)

这是使用Marpa::R2和重载对象的解决方案。结果比预期的要长,但看起来往返兼容。

标题很简单:

use strict; use warnings; use feature 'say';
use Marpa::R2;

use constant DEBUG => 0;

exit main();

这将需要Perl5,版本10或更高版本。接下来是parse子例程。这将执行标记化,并调用解析器。大多数令牌都被指定为数据(非显式代码),因此可以轻松扩展它们。

$print_diag是一个匿名子。它会关闭$string$last_pos,因此可以打印类似于die的相应错误消息。它将使用HERE-->箭头指出标记化问题的上下文。

$match如果有类似的关闭。它遍历所有可用的令牌并返回匹配的令牌,或者失败时返回false值。它使用m/\G.../gc正则表达式。这些类似于s/^...//,但不破坏字符串。 \G断言将在pos($string)匹配。 /c选项可确保失败不会改变pos

手动匹配字符串标记。您可能想要处理转义。我添加了对一些常用转义符的支持(\\\"\n\t和行继续反斜杠。

TOKEN循环拉取令牌并将其填充到识别器中。它包含很少的代码和很多错误处理。

最后,我们采用第一个可能的$parse树(可能有多个),并检查它是否成功。如果是这样,我们返回数据结构:

my $grammar; # filled later in INIT block

sub parse {
    my ($string) = @_;
    my ($last_pos, $length) = (0, length $string);
    my $rec = Marpa::R2::Recognizer->new({ grammar => $grammar });

    my $print_diag = sub {
        my ($problem) = @_;
        my ($behind, $ahead) = (15, 30);
        my $start = $last_pos > $behind ? $last_pos - $behind : 0;
        say STDERR "$problem at ", map ">>$_<<", join " HERE-->",
            substr($string, $start,    $behind),
            substr($string, $last_pos, $ahead );
        exit 1;
    };

    my @capture_token = (
        [qr/true|false/     => 'Bool'],     # bool must come before ident
        [qr/-?\d+(?:\.\d+)?/=> 'Number'],   # number must come before ident
        [qr/\w+/            => 'Ident'],
    );
    my @non_capture_token  = (
        [qr/\{/     => 'LCurly'],
        [qr/\}/     => 'RCurly'],
        [qr/=/      => 'Equal'],
        [qr/,/      => 'Comma'],
    );

    my $match = sub {
        # try String manually here:
        if ($string =~ m/\G"( (?: [^"]++ | \\. )*+ )"/gcxs) {
            my $str = $1;
            my %escapes = ( n => "\n", t => "\t", "\n" => '' );
            $str =~ s{\\(.)}{ $escapes{$1} // $1 }esg;
            return String => $str;
        }
        for (@non_capture_token) {
            my ($re, $type) = @$_;
            return $type if $string =~ m/\G$re/gc;
        }
        for (@capture_token) {
            my ($re, $type) = @$_;
            return $type, $1 if $string =~ m/\G($re)/gc;
        }
        return;
    };

    pos $string = $last_pos; # set match start for \G assertion to beginning

    TOKEN: while ($last_pos < $length) {
        next TOKEN if $string =~ m/\G\s+/gc;
        next TOKEN if $string =~ m/\G\#\N+/gc; # skip comments if you have such

        if (my @token = $match->()) {
            say STDERR "Token [@token]" if DEBUG;
            my $ok = $rec->read(@token);
            unless (defined $ok) {
                $print_diag->("Token [@token] rejected");
            }
        } else {
            $print_diag->("Can't understand input");
        }
    } continue {
        $last_pos = pos $string;
    }

    my $parse = $rec->value;
    unless ($parse) {
        say STDERR "Could not parse input";
        say STDERR "The Progress so far:";
        say STDERR $rec->show_progress;
        exit 1;
    }
    return $$parse;
}

现在我们指定语法。 Marpa可以通过我在这里使用的类似BNF的符号来处理。我主要是低级方法的语法糖。我可以指定操作(我稍后会写),并且可以通过将它们放入parens来决定捕获令牌。在这个阶段,我只能使用令牌类型,而不能使用令牌的值。在指定语法后,我必须使用$grammar->precompute编译它。

INIT {
    $grammar = Marpa::R2::Grammar->new({
        actions         => "MyActions", # a package name
        default_action  => 'first_arg',
        source          => \(<<'END_OF_GRAMMAR'),
        :start  ::= Value

        Value   ::= Bool            action => doBool
                |   Number          # use auto-action
                |   String          # use auto-action
                ||  Array
                ||  Struct

        Struct  ::= Ident (LCurly) PairList (RCurly)    action => doStruct
                |         (LCurly) PairList (RCurly)    action => doStruct1

        Array   ::= Ident (LCurly) ItemList (RCurly)    action => doArray
                |         (LCurly) ItemList (RCurly)    action => doArray1


        ItemList::= Value +         separator => Comma  action => doList
        PairList::= Pair +          separator => Comma  action => doList
        Pair    ::= Ident (Equal) Value                 action => doPair
END_OF_GRAMMAR
    });
    $grammar->precompute;
}

以上是在INIT块中,因此它将在parse完成之前执行。

现在来吧。将使用操作对象作为第一个参数调用每个操作,这是我们不需要的(它对更高级的解析技术很有帮助)。其他参数是匹配的标记/规则的值(不是类型)。其中大多数丢弃或打包参数,或将数据放在后面定义的对象中。

sub MyActions::first_arg {
    say STDERR "rule default action" if DEBUG;
    my (undef, $first) = @_;
    return $first;
}

sub MyActions::doStruct {
    say STDERR "rule Struct" if DEBUG;
    my (undef, $ident, $pair_list) = @_;
    my %hash;
    for (@$pair_list) {
        my ($k, $v) = @$_;
        $hash{$k} = $v;
    }
    return MyHash->new($ident, \%hash);
}

sub MyActions::doStruct1 {
    say STDERR "rule Struct sans Ident" if DEBUG;
    my (undef, $pair_list) = @_;
    return MyActions::doStruct(undef, undef, $pair_list);
}

sub MyActions::doArray {
    say STDERR "rule Array" if DEBUG;
    my (undef, $ident, $items) = @_;
    return MyArray->new($ident, $items);
}

sub MyActions::doArray1 {
    say STDERR "rule Array sans Ident" if DEBUG;
    my (undef, $items) = @_;
    MyActions::doArray(undef, undef, $items);
}

sub MyActions::doList {
    say STDERR "List" if DEBUG;
    my (undef, @list) = @_;
    return \@list;
}

sub MyActions::doPair {
    say STDERR "Pair" if DEBUG;
    my (undef, $key, $value) = @_;
    return [$key, $value];
}

sub MyActions::doBool {
    say STDERR "Bool" if DEBUG;
    my (undef, $bool) = @_;
    return MyBool->new($bool);
}

这是相当不引人注目的。我们需要这些特殊的对象,因为(a)它们稍后会将自己串联成正确的形式,并且(b)这样我就可以在curlies之前将类型或任何不完整的名称关联起来。 (和(c),Perl没有布尔类型,我必须覆盖它。)

首先是两个助手:$My::Indent设置打印输出缩进的空格数。 My::stringifyHelper只是确保将对象强制转换为字符串表示形式,并且字符串(其他所有不是数字的字符串)都用引号括起来。

INIT{ $My::Indent = 4 }
sub My::stringifyHelper {
    my (@objects) = @_;
    for (@objects) {
        if (ref $_) {
            $_ = "$_";
        } elsif ( not /\A-?\d+(?:\.\d+)?\z/) {
            $_ = qq("$_");
        }
    }
    return @objects;
}

这里是MyHash类型。字符串化代码是丑陋的,但它似乎工作→偶然编程。

{
    package MyHash;
    sub new {
        my ($class, $type, $hashref) = @_;
        bless [$type, $hashref] => $class;
    }
    sub type {
        my ($self) = @_;
        return $self->[0];
    }
    sub hash {
        my ($self) = @_;
        return $self->[1];
    }
    sub asString {
        my ($self) = @_;
        my @keys = sort keys %{ $self->hash };
        my @vals =
            map { s/\n\K/" "x$My::Indent/meg; $_ }
            My::stringifyHelper @{ $self->hash }{@keys};
        my $string = "";
        for my $i (0 .. $#keys) {
            $string .= (" "x$My::Indent) . "$keys[$i] = $vals[$i],\n";
        }
        return +($self->type // "") . "{\n$string}";
    }
    use overload
        '""'        => \&asString,
        '%{}'       => \&hash,
        fallback    => 1;
}

这实现了MyArray。字符串化稍微不那么难看,但我将对象表示为哈希。我对overload并不熟练,以确保在访问实际数组时不会以其他方式递归。

{
    package MyArray;
    sub new {
        my ($class, $type, $aryref) = @_;
        bless { type => $type, array => $aryref } => $class;
    }
    sub type {
        my ($self) = @_;
        return $self->{type};
    }
    sub array {
        my ($self) = @_;
        no overload;
        return $self->{array};
    }
    sub asString {
        my ($self) = @_;
        my @els = My::stringifyHelper @{$self->array};
        my $string = $self->type // "";
        if (@els <= 1) {
            $string .=  "{ @els, }";
        } else {
            my $els = join '', map "$_,\n", @els;
            $els =~ s/^/" "x$My::Indent/meg;
            $string .= "{\n$els}";
        }
        return $string;
    }
    use overload
        '""'        => \&asString,
        '@{}'       => \&array,
        fallback    => 1;
}

现在是小MyBool实现。它应该像布尔值一样工作:)

{
    package MyBool;
    sub new {
        my ($class, $str) = @_;
        my $bool;
        if ('true' eq lc $str)      { $bool = 1     }
        elsif ('false' eq lc $str)  { $bool = undef }
        else { die "Don't know if $str is true or false" }
        bless \$bool => $class;
    }
    use overload
        'bool' => sub {
            my ($self) = @_;
            return $$self;
        },
        '""' => sub {
            my ($self) = @_;
            $$self ? 'true' : 'false';
        },
        fallback => 1;
}

现在我们差不多完成了。这是main

sub main {
    local $/;
    my $data = <DATA>;
    my $dsc = parse($data);

    say "/:";
    say $dsc;

    say "/Tools:";
    say $dsc->{Tools};

    say "/Tools/ShadowDensity/:";
    say $dsc->{Tools}{ShadowDensity};

    say "/Tools/ShadowDensity/Inputs/:";
    say $dsc->{Tools}{ShadowDensity}{Inputs};

    return 0;
}

这会加载并解析数据。然后它打印出整个对象,只打印出某些部分。到目前为止,这似乎有效。

注意:如果对作为输入提供的数据运行解析器,它将失败。标记化成功,但你忘记了某个闭合支撑。在确定之后,它应该可以工作。

TODO:

  • 某些部分使用exit 1,应该抛出异常。
  • 以上访问有效,但其他路径失败(返回undef)。某处有一个错误,需要修复,但我不知道。
  • 更好的错误消息会更好,调试级别也会更加多样化。

答案 1 :(得分:1)

我认为它是一个结构化文件,带有你可能想要的“事件” 处理。所以我创建了一个结构化路径“event”类/对象和一个处理程序mux 类/对象。

use strict;
use warnings;

{   package LineEvent;  # our "event" class
    use strict;
    use warnings;

    # use overload to create a default stringification for the class/object
    use overload '""' => 'as_string', fallback => 1;

    # Create new path-tracking object    
    sub new { 
        my $self   = bless {}, shift;
        my %params = @_ % 2 ? ( base => @_ ) : @_;
        for ( qw<base delim verbose> ) { 
            $self->{$_} = $params{ $_ };
        }
        $self->{base}  ||= '';
        $self->{delim} ||= '.';
        return $self;
    }

    # pop back to larger named scope
    sub pop { 
        my $self  = shift;
        my $ref   = \$self->{base};
        my $pos   = rindex( $$ref, $self->{delim} );
        if ( $pos == -1 ) { 
            $self->{current} = '!Close';
        }
        else { 
            my $node = substr( $$ref, $pos + 1 );
            substr( $$ref, $pos ) = '';
            $self->{current} = "$node.!Close";
        }
        say qq{After pop, now "$self".} if $self->{verbose};
        return $self;
    }

    # push a new name as the current scope of the path
    sub push { 
        my ( $self, $level ) = @_;
        return unless $level;
        $self->{current} = '!Open';
        my $delim        = $self->{delim};
        $self->{base}
            .= ( substr( $level, 0, length( $delim )) eq $delim ? '' : $delim ) 
            .  $level
            ;
        say qq{After push, now "$self".} if $self->{verbose};
        return $self;
    }

    # push the temporary name sitting as current onto our base
    sub push_current { 
        return $_[0]->push( $_[0]->{current} ); 
    }

    # set a temporary name to identify the current line.
    sub update { 
        my ( $self, $tip ) = @_;
        $self->{current} = $tip // '';
        say qq{After update, now: "$self".} if $self->{verbose};
        return $self;
    }

    sub null_current { delete $_[0]->{current}; }

    # used in overload
    sub as_string {
        my $self  = shift;
        return join( $self->{delim}, grep {; length } @{ $self }{ qw<base current> } );
    }
};

sub pair_up {
    return map { [ @_[ $_, $_ + 1 ] ] } grep { $_ % 2 == 0 } 0..$#_;
}

{   package PathProcessor; # our mux class

    # create a event list and handler, by splitting them into pairs.
    sub new { 
        my $self = bless [], shift;
        @$self   = &::pair_up;
        return $self;
    }

    # process the current path
    sub process_path { 
        my ( $self, $path ) = @_;
        foreach my $pair ( @$self ) {
            my ( $test, $func ) = @$pair;
            next unless ref( $test ) 
                    ? $path =~ /$test/ 
                    : substr( $path, - length( $test )) eq $test
                    ;
            my $v = $func->( $path );
            return $v || !defined( $v );
        }
        return 1;
    }
}

my $path  = LineEvent->new( base => 'x' );

my $processor  
    = PathProcessor->new( 
      '.Text3.Inputs.StyledText' => sub { s/\bText\b/_Styled_ Text/ || 1; }
    , '.ShadowDensity.Inputs.Alpha' => sub { 
          s/(Value \s+ = \s+ )\K(\d+(?:\.\d+)?)/0.5/x || 1;
      }
    #, '.!Close' => sub { 
    #    say 'Closed!';
    #  }
    );

# We only handle a couple of conditions...
while ( <DATA> ) { 
    chomp;
    # ... If there is a keyword as the first thing in line
    if ( m/^ \s* ( \p{IsUpper} \w+ \b )/gcx ) {
        $path->update( $1 );
        # ... if it is followed by a equals sign, an optional name and
        # and open-bracket
        if ( m/\G \s+ = \s+ (?: \p{IsUpper} \w+ \s+ )? [{] \s* $/gcx ) {
            $path->push_current;
        }
    }
    # ... if it's a closing brace with an optional comma. 
    elsif ( m/^ \s* [}] ,? \s* $/x ) { 
        $path->pop;
    }
    else {
        $path->null_current;
    }
    say $path;
    # you can omit a line by passing back a false value
    say if $processor->process_path( $path );
}

__DATA__
Composition {
  CurrentTime = 0,
  OutputClips = {
    "",
  },
  Tools = {
      Text3 = Text3D {
          NameSet = true,
          Inputs  = {
            Size       = Input { Value = 0.6282723, },
            Font       = Input { Value = "KG Shadow of the Day", },
            StyledText = Input { Value = "Your Text Goes Here 3", },
          },
      },
      ShadowDensity = BrightnessContrast {
          NameSet = true,
          Inputs  = {
            Alpha = Input { Value = 1, },
            Gain  = Input { Value = 0.5, },
            Input = Input {
                SourceOp = "Loader2",
                Source   = "Output",
            },
          },
          ViewInfo = OperatorInfo { Pos = { -220, 82.5, }, },
      },
  },
}

输出结果为:

x.Composition
Composition {
x.CurrentTime
  CurrentTime = 0,
x.OutputClips.!Open
  OutputClips = {
x.OutputClips.!Text.1
    "",
x.OutputClips.!Close
  },
x.Tools.!Open
  Tools = {
x.Tools.Text3.!Open
      Text3 = Text3D {
x.Tools.Text3.NameSet
          NameSet = true,
x.Tools.Text3.Inputs.!Open
          Inputs  = {
x.Tools.Text3.Inputs.Size
            Size       = Input { Value = 0.6282723, },
x.Tools.Text3.Inputs.Font
            Font       = Input { Value = "KG Shadow of the Day", },
x.Tools.Text3.Inputs.StyledText
            StyledText = Input { Value = "Your _Styled_ Text Goes Here 3", },
x.Tools.Text3.Inputs.!Close
          },
x.Tools.Text3.!Close
      },
x.Tools.ShadowDensity.!Open
      ShadowDensity = BrightnessContrast {
x.Tools.ShadowDensity.NameSet
          NameSet = true,
x.Tools.ShadowDensity.Inputs.!Open
          Inputs  = {
x.Tools.ShadowDensity.Inputs.Alpha
            Alpha = Input { Value = 0.5, },
x.Tools.ShadowDensity.Inputs.Gain
            Gain  = Input { Value = 0.5, },
x.Tools.ShadowDensity.Inputs.Input.!Open
            Input = Input {
x.Tools.ShadowDensity.Inputs.Input.SourceOp
                SourceOp = "Loader2",
x.Tools.ShadowDensity.Inputs.Input.Source
                Source   = "Output",
x.Tools.ShadowDensity.Inputs.Input.!Close
            },
x.Tools.ShadowDensity.Inputs.!Close
          },
x.Tools.ShadowDensity.ViewInfo
          ViewInfo = OperatorInfo { Pos = { -220, 82.5, }, },
x.Tools.ShadowDensity.!Close
      },
x.Tools.!Close
  },
x.!Close
}