Question

我正在尝试创建一个字段名称数组，以后可以在我的脚本中使用。正则表达式正在踢我的屁股。我很长时间没有编写代码。字段名称嵌入在XML标记中，因此我想我可以从第一行数据的结束标记中提取它们。我看不到正确地填充阵列.....任何人都可以为我提供一些亮点吗？

my $firstLineOfXMLFile = <record>DEFECT000179<\record><state>Approved<\state><title>Something is broken<\title>

my @fieldNames = $firstLineOfXMLFile =~ m(<\(.*)>)g; #problem, can't seem to grab the text within the end tags.

print @fieldNames;

非常感谢！ -Matt

Answer 1

您的示例数据不是XML。你的斜线是向后的。假设是 XML，你试图解析，答案是'不要使用正则表达式'。

他们根本无法应对递归和嵌套到必要的程度。

因此，考虑到这一点 - 假设您的示例数据实际上是格式良好的XML并且这是一个拼写错误，像XML::Twig这样的内容会非常方便：

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig -> parse ( \*DATA );

#extract a single field value
print $twig -> root -> first_child_text('title'),"\n";
#get a field name
print $twig -> root -> first_child -> tag,"\n";
#can also use att() if you have attributes


print "Field names:\n";
#children() returns all the children of the current (in this case root) node
#We use map to access all, and tag to read their 'name'. 
#att or trimmed_text would do other parts of the XML. 
print join ( "\n", map { $_ -> tag } $twig -> root -> children );

__DATA__
<XML>
<record>DEFECT000179</record><state>Approved</state><title>Something is broken</title>
</XML>

打印：

Something is broken
record
Field names:
record
state
title

您还有许多其他非常有用的工具，例如pretty_print用于格式化输出XML，twig_handlers可让您在解析时操作XML（特别方便purge），cut和paste移动节点，get_xpath允许您使用xpath表达式根据路径和属性查找元素。

编辑：根据评论，如果您真的想从以下位置提取数据：

</something>

你的东西出了问题，.*是贪婪的。你要么需要使用否定的匹配 - 比如：

m,</[^>]>,g

或非同意的比赛：

m,</(.*?)>,g

哦，给你一个反斜杠 - 你需要逃避它：

my $firstLineOfXMLFile = '<record>DEFECT000179<\record><state>Approved<\state><title>Something is broken<\title>';
my @fieldNames = $firstLineOfXMLFile =~ m(<\\(.*?)>)g;
print @fieldNames;

会做的伎俩。（但是认真地 - 故意创建看起来像XML的东西并不是一件非常糟糕的事情）

从XML结束标记填充数组

1 个答案: