LibXML:正在报告“ xmlns”属性,但不在XML输入文件中

时间:2019-06-13 12:43:39

标签: xml perl xml-parsing namespaces xml-namespaces

我有以下XML文件sheetX.xml(摘自Excel XML工作表文件):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
           xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
           xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
           xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
           xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
           xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
           xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
           mc:Ignorable="x14ac xr xr2 xr3"
           xr:uid="{109BF357-4A9A-4969-B57D-8A2B0130DC3F}">
  <dimension ref="A1"/>
  <sheetViews>
    <sheetView tabSelected="1" topLeftCell="M1" workbookViewId="0">
      <selection activeCell="A1" sqref="A1"/>
    </sheetView>
  </sheetViews>
  <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>  
  <sheetData/>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>

我正在使用XML::LibXML Perl模块读取文件

use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;

my $reader = XML::LibXML::Reader->new( location => sheetX.xml);
$reader->read();
while($NERROR1==0){
    my $doc = $reader->copyCurrentNode(1);
    if(!defined $doc){
        $NERROR1=-1;
    } else {
        if($reader->attributeCount()>0){
            print "tag name:" . $reader->name() . "\n";
            my @attributelist = $doc->attributes();
            for my $iAtt (0 .. scalar @attributelist-1){
                print "Att name:" . $attributelist[$iAtt]->nodeName() . "\n";
                print "Att value:" . $attributelist[$iAtt]->value . "\n";
            }
        }
        $reader->nextElement();
    }
}
$reader->close();

perl模块中某些标签的输出为:

tag name:worksheet
Att name:mc:Ignorable
Att value:x14ac xr xr2 xr3
Att name:xr:uid
Att value:{00000000-0001-0000-0400-000000000000}
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:mc
Att value:http://schemas.openxmlformats.org/markup-compatibility/2006
Att name:xmlns:r
Att value:http://schemas.openxmlformats.org/officeDocument/2006/relationships
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
Att name:xmlns:xr
Att value:http://schemas.microsoft.com/office/spreadsheetml/2014/revision
Att name:xmlns:xr2
Att value:http://schemas.microsoft.com/office/spreadsheetml/2015/revision2
Att name:xmlns:xr3
Att value:http://schemas.microsoft.com/office/spreadsheetml/2016/revision3

tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main

tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac

因此,基本上,代码正在打印出xmlnssheetView标签具有sheetFormatPr标签的属性,这些属性未在XML文件中显示。文件中显示的所有属性,没有多余的属性。

在某个阶段,我需要从我的perl程序生成的数据中重建XML文件(该程序还会打印出标签,值等)。所以我的问题是:有什么办法让我的perl程序打印出XML文件中显示的标签,而不是其他未显示的标签?

1 个答案:

答案 0 :(得分:3)

这是我所知道的排除xmlns属性的最小更改集。注意标记为###的已更改行。我不确定您使用$NERROR1的其他代码可能会做什么。为了简单起见,我在这里将其删除。大部分都是根据docs改编而成的。

use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;

my $reader = XML::LibXML::Reader->new( location => 'foo.xml' );
$reader->read();

###my $NERROR1;              # Needed to add this because of `use strict`
###while($NERROR1==0){
while($reader->read) {       ### Per the docs.
    my $node = $reader->copyCurrentNode(1);    ### Might not be a document, so $node instead of $doc
###    if(!defined $doc){
###        $NERROR1=-1;
###    } else {
        if($reader->attributeCount>0){
            print "tag name:" . $reader->name . "\n";
###            my @attributelist = $doc->attributes();
###            for my $iAtt (0 .. scalar @attributelist-1){
            for my $att ($node->attributes) {           ### Simpler form of the loop --- don't need the indices.
                next if $att->nodeName =~ /^xmlns\b/;   ### <== The key - skip to the next attribute if this one starts with "xmlns"
                print "Att name:" . $att->nodeName . "\n";
                print "Att value:" . $att->value . "\n";
            }
        }
###        $reader->nextElement();
###    }
}
$reader->close();

输出

tag name:dimension
Att name:ref
Att value:A1
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
tag name:selection
Att name:activeCell
Att value:A1
Att name:sqref
Att value:A1
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
tag name:pageMargins
Att name:left
Att value:0.7
Att name:right
Att value:0.7
Att name:top
Att value:0.75
Att name:bottom
Att value:0.75
Att name:header
Att value:0.3
Att name:footer
Att value:0.3

说明

我找到了一个链接到PerlMonks threadRFC 4918, p. 40,对此进行了澄清

  

由于“ xmlns”属性不包含前缀,因此名称空间默认情况下适用于所有包含的元素。

在这种情况下,<worksheet>标签声明了默认命名空间xmlns="http://schemas...2006/main"。这适用于包含的元素,因此<sheetView>中的<sheetFormatPr><worksheet>标签也具有该默认名称空间。 XML :: LibXML :: Reader通过报告这些节点上的xmlns属性,使您可以访问该信息。