在xml解析期间出错 - Perl

时间:2012-08-23 16:34:22

标签: xml perl

我正在尝试使用XML::Tidy来缩进XML文件:

sub reformatXML {
    #
    # the only argument to this function is the file name
    #
    my $file = $_[ 0 ];
    #
    # create a new XML::Tidy object from $file
    #
    my $tidy = XML::Tidy->new( 'filename' => $file );
    #
    # Tidy up the indenting
    #
    $tidy->tidy();
    #
    # write out changes back to the file
    #
    $tidy->write();
    print "$file was reformated.\n";
    return
}

sub main(){
    #
    # get the current directory in which is the 
    # program running on
    #
    #my $current_dir = getcwd;
    #iterateDir( $current_dir );
    my $file = "/path/to/xml/file/autotest.xml";
    reformatXML( $file );
}

就这么简单。但是,当我调用main()函数时,我得到了:

501 Protocol scheme 'd' is not supported d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd
Handler couldn't resolve external entity at line 2, column 29, byte 73
error in processing external entity reference at line 2, column 29, byte 73:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE kit SYSTEM "tc.dtd">
============================^
<kit>
  <contact/>
 at C:/xampp/perl/site/lib/XML/Parser.pm line 187

我是Perl的新手,我不知道为什么会出现这个错误。有人可以帮我解决一下吗?

XML文件的头部是:

 <?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE kit SYSTEM "tc.dtd">
<kit>
  <contact/>
  <description>autotest files</description>
  <history>
    <hist>06-May-2005                  Created</hist>
    <hist>17-Jun-2005            Add autotest.jar to rtkit</hist>
    <hist>29-Jun-2005            Remove bits picked up elsewhere</
hist>
    <hist>15-Jul-2005            Added acad_add_note_types.</hist>
    <hist>20-Sep-2005            Add ai stuff</hist>
    <hist>31-Oct-2005            DMS BnT fixes</hist>
    <hist>03-Nov-2005            Pander to kitting's obsession abo
ut unique filenames</hist>
    <hist>17-Nov-2005            Add ics schema and junit</hist>
    <hist>09-Dec-2005            add gdt_autotest</hist>
    <hist>11-Jan-2006            Merge in P10.0.1.5</hist>
    <hist>16-Jan-2006      Merge</hist>
    <hist>26-Jan-2006      Need inclass.plmxml to pass tceng
_util autotest</hist>
    <hist>06-Mar-2006      Add qdiff.pl</hist>
    <hist>09-Mar-2006      Kernel tests need a couple fms cl
ient files</hist>
    <hist>10-Mar-2006      Missing dependent library</hist>
    <hist>19-Jan-2006      Merged from timb_gmo</hist>
    <hist>17-Jan-2006      GMO Kernel Autotests Implementati

       

1 个答案:

答案 0 :(得分:2)

XML :: Tidy(或者更确切地说,它使用的模块之一)似乎期望文件的绝对路径是有效的URL,而事实并非如此。它认为指定的URL是

 d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd

当它真的

 file:///d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd

我不知道如何解决这个bug。您可以尝试更改

my $file = "...";
reformatXML($file);

my $file = "...";
my $url = URI::file->new($file);
reformatXML($url);

这是直接的错误。除此之外,还存在向DTD提供相对URL的问题。这不一定是错的,但有点奇怪。这意味着tc.dtd必须与autotest.xml位于同一目录中。真的是这样吗?


某些解析器(例如XML :: LibXML)可以选择避免获取DTD。这通常是不必要的,因此浪费时间,金钱,CPU和带宽。寻找这样的选择。它可能位于XML :: Tidy继承的类之一的构造函数中。