我需要编写一个快速(明天)过滤器脚本来替换转义换行符\n
在双引号字符串中找到的换行符(LF或CRLF)。内容是一个(损坏的)javascript程序,因此我需要在字符串中允许转义序列,如"ab\"cd"
和"ab\\"cd"ef"
。
我知道sed并不适合这项工作,因为它每行工作,所以我转向perl,其中我什么都不知道。)
我写了这个正则表达式:"(((\\.)|[^"\\\n])*\n?)*"
并使用http://regex.powertoy.org对其进行了测试。它确实将带引号的字符串与换行符匹配,但是perl -p -e 's/"(((\\.)|[^"\\\n])*(\n)?)*"/TEST/g'
没有。
所以我的问题是:
这个similar question有awk解决方案,但它不是我需要的。
注意:我通常不会问“请为我做这个”问题,但我真的不想明天学习perl / awk ... :)
编辑:示例数据
"abc\"def" - matches as one string
"abc\\"def"xy" - match "abcd\\" and "xy"
"ab
cd
ef" - is replaced by "ab\ncd\nef"
答案 0 :(得分:2)
这是一个简单的Perl解决方案:
s§
\G # match from the beginning of the string or the last match
([^"]*+) # till we get to a quote
"((?:[^"\\]++|\\.)*+)" # match the whole quote
§
$a = $1;
$b = $2;
$b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
"$a\"$b\"";
§gex;
如果你不想使用/e
而只是使用一个正则表达式,这是另一个解决方案:
use strict;
$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_
print "Original:\n", $_, "\n";
s/
(
(?:
# at the beginning of the string match till inside the quotes
^(?&outside_quote) "
# or continue from last match which always stops inside quotes
| (?!^)\G
)
(?&inside_quote) # eat things up till we find what we want
)
x # the thing we want to replace
(
(?&inside_quote) # eat more possibly till end of quote
# if going out of quote make sure the match stops inside them
# or at the end of string
(?: " (?&outside_quote) (?:"|\z) )?
)
(?(DEFINE)
(?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
(?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;
print "Replaced:\n", $_, "\n";
输出:
Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x
要使用换行符而不是x,只需在正则表达式中替换它,如下所示:
s/
(
(?:
# at the beginning of the string match till inside the quotes
^(?&outside_quote) "
# or continue from last match which always stops inside quotes
| (?!^)\G
)
(?&inside_quote) # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
(?&inside_quote) # eat more possibly till end of quote
# if going out of quote make sure the match stops inside them
# or at the end of string
(?: " (?&outside_quote) (?:"|\z) )?
)
(?(DEFINE)
(?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
(?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;
答案 1 :(得分:1)
在OP发布一些示例内容进行测试之前,尝试将“m”(可能还有“s”)标记添加到正则表达式的末尾;来自perldoc perlreref
(reference):
m Multiline mode - ^ and $ match internal lines
s match as a Single line - . matches \n
对于测试,您可能还会发现添加命令行参数“-i.bak”以便保留原始文件的备份(现在扩展名为“.bak”)。
另请注意,如果您想捕获但不能存储某些内容,则可以使用(?:PATTERN)
而不是(PATTERN)
。获取捕获的内容后,使用$1
到$9
来访问匹配部分中存储的匹配项。
有关详细信息,请参阅相关链接以及perldoc perlretut
(tutorial)和perldoc perlre
(full-ish documentation)
答案 2 :(得分:1)
#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;
$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");
print "befor: {{$_}}\n";
s{($RE{quoted})}
{ (my $x=$1) =~ s/\n/\\n/g;
$x
}ge;
print "after: {{$_}}\n";
答案 3 :(得分:1)
使用Perl 5.14.0(与perlbrew一起安装)可以这样做:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.14.0;
use Regexp::Common qw/delimited/;
my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END
my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;
print $output;
我需要5.14.0作为内部替换的/r
标志。如果有人知道如何避免这种情况,请告诉我。