如何使用sed或perl或awk等正则表达式删除文本块?

时间:2013-11-18 11:36:42

标签: regex perl sed awk

我有一个php文件:

<?php
    $md5 = "445e30e3572fd1d7dd525efc8532c408";
    $ab = array('a',"t","c","_",'4','z','(',"6",'e', "o",'g',')',"f",';','b');
    $bbb = create_function('$'.'v',$ab[8].$ab[12]...);
    $bbb('DZZF0oRqEkWX0...');
?>
<?php
    //SOME PHP CODE
?>

我想使用perl / sed / awk删除第一个代码块(以<?php开头和结尾)。

我尝试过使用PHP的以下常规表达式:

<\?php\n\$md5[\s\S]*?\?> 

但它不适用于perl和sed。对于我做错了什么的任何建议?

3 个答案:

答案 0 :(得分:2)

cat in.txt

<?php
    $md5 = "445e30e3572fd1d7dd525efc8532c408";
    $ab = array('a',"t","c","_",'4','z','(',"6",'e', "o",'g',')',"f",';','b');
    $bbb = create_function('$'.'v',$ab[8].$ab[12]...);
    $bbb('DZZF0oRqEkWX0...');
?>
<?php
    //SOME PHP CODE
?>

使用sed:

sed '/<?php/,/<?php/d' in.txt

输出:

 //SOME PHP CODE
?>

答案 1 :(得分:2)

这可能有帮助吗?

 awk '/^?>/{if(!f){f=1;next}}f' file

输出:

<?php
    //SOME PHP CODE
?>

答案 2 :(得分:0)

如果你想避免在引号内部或在heredoc / nowdoc语法中使用hypotetic ?>,你可以使用这个(有点长)模式:

#!/usr/bin/perl 
use strict;
use warnings;
my $string = <<'END';
<?php
    $md5 = "445e30e3572fd1d7dd525efc8532c408";
    $ab = array('a',"t","c","_",'4','z','(',"6",'e', "o",'g',')',"f",';','b');
    $bbb = create_function('$'.'v',$ab[8].$ab[12]...);
    $bbb('DZZF0oRqEkWX0...');
?>
<?php
    //SOME PHP CODE
?>
END

my $pattern = qr/
    <\?php\s+\$md5
    (?> [^"'?<]++                         # all characters except " ' < ?
      | \?(?!>)                           # ? not followed by >
      | "(?>[^\\"]++|\\{2}|\\.)*"         # string inside double quotes
      | '(?>[^\\']++|\\{2}|\\.)*'         # string inside simple quotes
      | <(?!<<\'?\w)                      # < that is not the start of an heredoc declaration
      | <<<(\'?)(\w++)\1\R.*?(?<=\n)\2\R  # string inside heredoc or nowdoc
    )*
   \?>
 /xs;

$string =~ s/$pattern//g; # for only the first occurence you can remove the g
print $string;

(抱歉,这不是单行)