删除Perl中两个引号之间的文本?

时间:2012-03-06 18:36:07

标签: regex string perl

我以为我想到了这一点,但我想找到一个文件中的所有出现,我在两个双引号之间删除了一些文本。

我需要先找到一个匹配,然后从第一个双引号到匹配,然后将所有文本到第二个双引号并删除它。我不想在两个双引号之间获取文本,因为它可能不是我要删除的文件中的内容。

我使用过这样的东西:

perl -p -i.bak -e s/bar/foo/g bar.xml

首先做一个有效的查找和替换。 然后我去了:

perl -p -i.bak -e s/..\/..\/bar\//g bar.xml

并删除了所有内容到bar,但我需要继续一直到第二个双引号,我不知道怎么用Perl做到这一点。

我认为这将是一些混合使用的正则表达式,但我尝试过的任何东西都没有效果。直到条形的部分将始终相同,但文本将在该点之后更改,但是,它将始终以我要删除的部分的第二个双引号结束。在那之后会再次出现文字。

3 个答案:

答案 0 :(得分:5)

s/"[^"]*foo[^"]*"//g
如果实际引号之间没有转义引号,并且您要删除包含foo的带引号的字符串,则

有效:

"      # Match a quote
[^"]*  # Match any number of characters except quotes
foo    # Match foo
[^"]*  # Match any number of characters except quotes
"      # Match another quote

答案 1 :(得分:2)

有些人在询问逃脱的报价。这里有几个技巧。您希望忽略\"之类的转义引号,但不引用具有转义转义的字符,例如\\"。为了忽略第一个,我使用负面看。为了不忽略第二个,我暂时将所有\\更改为。如果您有数据,请选择其他内容。

use v5.14;
use utf8;
use charnames qw(:full);

my $regex = qr/
    (?<!\\) "  # a quote not preceded by a \ escape
    (.*?)      # anything, non greedily
    (?<!\\) "  # a quote not preceded by a \ escape
    /x;

while( <DATA> ) {
    # encode the escaped escapes for now
    s/(?:\\){2}/\N{SMILING CAT FACE WITH OPEN MOUTH}/g;
    print "$.: ", $_;

    while( m/$regex/g ) {
        my $match = $1;
        # decode the escaped escapes
        $match =~ s/\N{SMILING CAT FACE WITH OPEN MOUTH}/\\\\/g;
        say "\tfound → $match";
        }
    }

__DATA__
"One group" and "another group"
This has "words between quotes" and words outside
This line has "an \" escaped quote" and other stuff
Start with \" then "quoted" and "quoted again"
Start with \" then "quoted \" with escape" and \" and "quoted again"
Start with \" then "quoted \\" with escape"
Start with \" then \\\\"quoted \\" with escape\\"

输出结果为:

1: "One group" and "another group"
    found → One group
    found → another group
2: This has "words between quotes" and words outside
    found → words between quotes
3: This line has "an \" escaped quote" and other stuff
    found → an \" escaped quote
4: Start with \" then "quoted" and "quoted again"
    found → quoted
    found → quoted again
5: Start with \" then "quoted \" with escape" and \" and "quoted again"
    found → quoted \" with escape
    found → quoted again
6: Start with \" then "quoted " with escape"
    found → quoted \\
7: Start with \" then "quoted " with escape"
    found → quoted \\

答案 2 :(得分:0)

你输入说文件是.xml - 所以我要说出我通常做的事情。

使用XML解析器 - 我喜欢XML::Twig因为我认为最初可以更容易掌握。 XML::LibXML也很好。

现在,根据您提出的问题 - 喜欢,您正在尝试重写XML属性中的文件路径。

所以:

#!/usr/bin/env perl/

use strict;
use warnings;

use XML::Twig;

#my $twig = XML::Twig -> parsefile ( 'test.xml');
my $twig = XML::Twig -> parse ( \*DATA );

foreach my $element ( $twig -> get_xpath('element[@path]') ) {
   my $path_att = $element -> att('path');
   $path_att =~ s,/\.\./\.\./bar/,,g;
   $element -> set_att('path', $path_att);
}

$twig -> set_pretty_print('indented_a');
$twig -> print;
__DATA__
<root>
   <element name="test" path="/path/to/dir/../../bar/some_dir">
   </element>
   <element name="test2" nopath="here" />
   <element path="/some_path">content</element>
</root>

XML::Twig也非常有用地支持parsefile_inplace使用“sed style”来修改文件。以上是一些概念的例证,其中包含一些样本XML - 有一个更清楚的例子说明你正在尝试做什么,我应该能够改进它。