用\ n替换带引号的字符串中的换行符

时间:2011-06-19 14:07:38

标签: regex linux perl scripting awk

我需要编写一个快速(明天)过滤器脚本来替换转义换行符\n在双引号字符串中找到的换行符(LF或CRLF)。内容是一个(损坏的)javascript程序,因此我需要在字符串中允许转义序列,如"ab\"cd""ab\\"cd"ef"

我知道sed并不适合这项工作,因为它每行工作,所以我转向perl,其中我什么都不知道。)

我写了这个正则表达式:"(((\\.)|[^"\\\n])*\n?)*"并使用http://regex.powertoy.org对其进行了测试。它确实将带引号的字符串与换行符匹配,但是perl -p -e 's/"(((\\.)|[^"\\\n])*(\n)?)*"/TEST/g'没有。

所以我的问题是:

  1. 如何让perl匹配换行符?
  2. 如何编写“替换”部分,以便保留原始字符串并仅替换换行符?
  3. 这个similar question有awk解决方案,但它不是我需要的。

    注意:我通常不会问“请为我做这个”问题,但我真的不想明天学习perl / awk ... :)

    编辑:示例数据

    "abc\"def" - matches as one string
    "abc\\"def"xy" - match "abcd\\" and "xy"
    "ab
    cd
    ef" - is replaced by "ab\ncd\nef"
    

4 个答案:

答案 0 :(得分:2)

这是一个简单的Perl解决方案:

s§
    \G # match from the beginning of the string or the last match
    ([^"]*+) # till we get to a quote
    "((?:[^"\\]++|\\.)*+)" # match the whole quote
§
    $a = $1;
    $b = $2;
    $b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
    "$a\"$b\"";
§gex;

如果你不想使用/e而只是使用一个正则表达式,这是另一个解决方案:

use strict;

$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_

print "Original:\n", $_, "\n";

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
x   # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;

print "Replaced:\n", $_, "\n";

输出:

Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x

Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x

要使用换行符而不是x,只需在正则表达式中替换它,如下所示:

s/
(
    (?:
        # at the beginning of the string match till inside the quotes
        ^(?&outside_quote) "
        # or continue from last match which always stops inside quotes
        | (?!^)\G
    )
    (?&inside_quote)  # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
    (?&inside_quote)  # eat more possibly till end of quote
    # if going out of quote make sure the match stops inside them
    # or at the end of string
    (?: " (?&outside_quote) (?:"|\z) )?
)

(?(DEFINE)
    (?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
    (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;

答案 1 :(得分:1)

在OP发布一些示例内容进行测试之前,尝试将“m”(可能还有“s”)标记添加到正则表达式的末尾;来自perldoc perlreref (reference)

m  Multiline mode - ^ and $ match internal lines
s  match as a Single line - . matches \n

对于测试,您可能还会发现添加命令行参数“-i.bak”以便保留原始文件的备份(现在扩展名为“.bak”)。

另请注意,如果您想捕获但不能存储某些内容,则可以使用(?:PATTERN)而不是(PATTERN)。获取捕获的内容后,使用$1$9来访问匹配部分中存储的匹配项。

有关详细信息,请参阅相关链接以及perldoc perlretut (tutorial)perldoc perlre (full-ish documentation)

答案 2 :(得分:1)

#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;

$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");

print "befor: {{$_}}\n";
s{($RE{quoted})}
 {  (my $x=$1) =~ s/\n/\\n/g;
    $x
 }ge;
print "after: {{$_}}\n";

答案 3 :(得分:1)

使用Perl 5.14.0(与perlbrew一起安装)可以这样做:

#!/usr/bin/env perl

use strict;
use warnings;

use 5.14.0;

use Regexp::Common qw/delimited/;

my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END

my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;

print $output;

我需要5.14.0作为内部替换的/r标志。如果有人知道如何避免这种情况,请告诉我。