Question

我有下面的字符串，其中包含由（\ n）分隔的文本。我想使用正则表达式匹配xml内容，删除所有空格和\ n并将其转换为单行。我使用了以下正则表达式：

my $string = "this contains the text which I pasted below in before section";
$string=~ m/(^.*)(<[a-zA-Z]*>)/;
$extractedXml = $2;

为什么上面的代码没有获得XML内容？

之前：

G11N/Locale=en_USY:/default/main/test1/test/test2/test4/test5/default.site
G11N/Localizable=true
TeamSite/Assocation/Version=1
TeamSite/LiveSite/DeploymentAudit=<?xml version="1.0" encoding="UTF-8"?>
<Deployments>
    <test>hello</test>
</Deployments>

后：

Y:/default/main/test1/test/test2/test4/test5/default.site
G11N/Locale=en_US
G11N/Localizable=true
TeamSite/Assocation/Version=1
TeamSite/LiveSite/DeploymentAudit=<?xml version="1.0" encoding="UTF-8"?><Deployments><test>hello</test></Deployments>

http://regex101.com/r/zZ0wB8
您可以检查它是否在这里工作，但在实际代码中它只匹配第一行。

Answer 1

对于您的示例，以下解决方案有效：

my $string = <<"FOO";
G11N/Locale=en_USY:/default/main/test1/test/test2/test4/test5/default.site
G11N/Localizable=true
TeamSite/Assocation/Version=1
TeamSite/LiveSite/DeploymentAudit=<?xml version="1.0" encoding="UTF-8"?>
<Deployments>
    <test>hello</test>
</Deployments>";
FOO

$string =~ s/^\s+(<.+$)/$1/gm;
$string =~ s/>\n/>/gm;

print $string;

它将首先从任何以xml标记和空格开头的行开头删除空格，然后删除任何以xml标记结尾的行结尾处的换行符。

这是一种非常务实的方法，很可能在所有情况下都不起作用。它仅适用于unix文件，因为\n。

Answer 2

你可以用这个：

my ($xml) = $string =~ m!(<Deployments>.*?</Deployments>)!gis;

问候。

使用Perl正则表达式从字符串中分离XML内容

2 个答案: