在perl中解析xml

时间:2015-02-13 07:50:54

标签: xml perl parsing

我想用Perl解析这个xml。我在这里展示的XML只是更大和嵌套的XML的一部分。我尝试使用普通的解析器,其中大多数都以哈希格式提供输出,难以读取和访问子节点。

我想获取元素并读取所有属性值。

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<TR name="App.exe" total="573" errors="1" failures="2" not-run="4" inconclusive="2" ignored="4" skipped="0" invalid="0" date="2015-01-12" time="17:43:59">
  <environment version="2" cversion="44" os-version="Microsoft" platform="Win32NT" cwd="" machine-name="" user="me" user-domain="domain" />
  <culture-info current-culture="en-US" current-uiculture="en-US" />
  <TS type="Assembly" name="App.exe" executed="True" result="Failure" success="False" time="22" asserts="0">
    <RS>
      <TS type="Namespace" name="MyAPP" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
        <RS>
          <TS type="Namespace" name="Project" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
            <RS>
              <TS type="Namespace" name="Website" executed="True" result="Failure" success="False" time="2335.164" asserts="0">
                <RS>
                  <TS type="Namespace" name="Service" executed="True" result="Failure" success="False" time="2335.163" asserts="0">
                    <RS>
                      <TS type="SetUpFixture" name="Tests" executed="True" result="Failure" success="False" time="2335.163" asserts="0">
                        <RS>
                          <TS type="Namespace" name="tempt" executed="True" result="Success" success="True" time="8.935" asserts="0">
                            <RS>
                              <TS type="ParameterizedFixture" name="TempAPI" executed="True" result="Success" success="True" time="8.935" asserts="0">
                                <RS>
                                  <TS type="TestFixture" name="Admin" executed="True" result="Success" success="True" time="3.306" asserts="2">
                                    <RS>
                                      <TC name="testName1" executed="True" result="Success" success="True" time="0.352" asserts="0" />
                                      <TC name="testName2" executed="True" result="Success" success="True" time="0.005" asserts="0" />
                                    </RS>
                                  </TS>
                                  <TS type="TestFixture" name="Client" executed="True" result="Success" success="True" time="2.620" asserts="1">
                                    <RS>
                                      <TC name="testName3" executed="True" result="Success" success="True" time="0.319" asserts="0" />
                                      <TC name="testName4" executed="True" result="Success" success="True" time="0.000" asserts="0" />
                                    </RS>
                                  </TS>
                                  <TS type="TestFixture" name="Employee" executed="True" result="Success" success="True" time="3.007" asserts="1">
                                    <RS>
                                      <TC name="testName5" executed="True" result="Success" success="True" time="0.290" asserts="0" />
                                      <TC name="testName6" executed="True" result="Success" success="True" time="0.000" asserts="0" />
                                    </RS>
                                  </TS>
                                </RS>
                              </TS>
                            </RS>
                          </TS>
                        </RS>
                      </TS>
                    </RS>
                  </TS>
                </RS>
              </TS>
            </RS>
          </TS>
        </RS>
      </TS>
    </RS>
  </TS>
</TR>

我尝试过这样做,正如我所说的那样,它会提供难以阅读和获取细节的哈希输出。

my $list = XMLin('F:\Sample.xml', KeepRoot => 1);

#print $list-->{TS}[0]{name};
print Dumper($list );
write_file 'F:\mydump.log', Dumper($list);

我需要有关解析器的建议,该解析器可以输出比哈希更容易阅读的格式。

使用这个XML :: Simple我得到以下格式

$VAR1 = {
          'TR' => {
                  'failures' => '2',
                  'TS' => {
                          'asserts' => '0',
                          'success' => 'False',
                          'time' => '22',
                          'name' => 'App.exe',
                          'executed' => 'True',
                          'type' => 'Assembly',
                          'RS' => {
                                  'TS' => {
                                          'asserts' => '0',
                                          'success' => 'False',
                                          'time' => '2335.164',
                                          'name' => 'MyAPP',
                                          'executed' => 'True',
                                          'type' => 'Namespace',
                                          'RS' => {
                                                  'TS' => {
                                                          'asserts' => '0',
                                                          'success' => 'False',
                                                          'time' => '2335.164',
                                                          'name' => 'Project',
                                                          'executed' => 'True',
                                                          'type' => 'Namespace',
                                                          'RS' => {
                                                                  'TS' => {
                                                                          'asserts' => '0',
                                                                          'success' => 'False',
                                                                          'time' => '2335.164',
                                                                          'name' => 'Web',
                                                                          'executed' => 'True',
                                                                          'type' => 'Namespace',
                                                                          'RS' => {
                                                                                  'TS' => {
                                                                                          'asserts' => '0',
                                                                                          'success' => 'False',
                                                                                          'time' => '2335.163',
                                                                                          'name' => 'Server',
                                                                                          'executed' => 'True',
                                                                                          'type' => 'Namespace',
                                                                                          'RS' => {
                                                                                                  'TS' => {
                                                                                                          'asserts' => '0',
                                                                                                          'success' => 'False',
                                                                                                          'time' => '2335.163',
                                                                                                          'name' => 'Tests',

                                                                                                                                                          'Client' => {
                                                                                                                                                                      'success' => 'True',
                                                                                                                                                                      'asserts' => '1',
                                                                                                                                                                      'time' => '2.620',
                                                                                                                                                                      'executed' => 'True',
                                                                                                                                                                      'type' => 'TestFixture',
                                                                                                                                                                      'RS' => {
                                                                                                                                                                              'TC' => {
                                                                                                                                                                                      'testName3' => {
                                                                                                                                                                                                     'success' => 'True',
                                                                                                                                                                                                     'asserts' => '0',
                                                                                                                                                                                                     'time' => '0.319',
                                                                                                                                                                                                     'executed' => 'True',
                                                                                                                                                                                                     'result' => 'Success'
                                                                                                                                                                                                   },
                                                                                                                                                                                      'testName4' => {
                                                                                                                                                                                                     'success' => 'True',
                                                                                                                                                                                                     'asserts' => '0',
                                                                                                                                                                                                     'time' => '0.000',
                                                                                                                                                                                                     'executed' => 'True',
                                                                                                                                                                                                     'result' => 'Success'
                                                                                                                                                                                                   }
                                                                                                                                                                                    }
                                                                                                                                                                            },
                                                                                                                                                                      'result' => 'Success'
                                                                                                                                                                    },
                                                                                                                                                          'Admin' => {
                                                                                                                                                                     'success' => 'True',
                                                                                                                                                                     'asserts' => '2',
                                                                                                                                                                     'time' => '3.306',
                                                                                                                                                                     'executed' => 'True',
                                                                                                                                                                     'type' => 'TestFixture',
                                                                                                                                                                     'RS' => {
                                                                                                                                                                             'TC' => {
                                                                                                                                                                                     'testName1' => {
                                                                                                                                                                                                    'success' => 'True',
                                                                                                                                                                                                    'asserts' => '0',
                                                                                                                                                                                                    'time' => '0.352',
                                                                                                                                                                                                    'executed' => 'True',
                                                                                                                                                                                                    'result' => 'Success'
                                                                                                                                                                                                  },
                                                                                                                                                                                     'testName2' => {
                                                                                                                                                                                                    'success' => 'True',
                                                                                                                                                                                                    'asserts' => '0',
                                                                                                                                                                                                    'time' => '0.005',
                                                                                                                                                                                                    'executed' => 'True',
                                                                                                                                                                                                    'result' => 'Success'
                                                                                                                                                                                                  }
                                                                                                                                                                                   }
                                                                                                                                                                           },
                                                                                                                                                                     'result' => 'Success'
                                                                                                                                                                   }
                                                                                                                                                        }
                                                                                                                                                },
                                                                                                                                          'result' => 'Success'
                                                                                                                                        }
                                                                                                                                },
                                                                                                                          'result' => 'Success'
                                                                                                                        }
                                                                                                                },
                                                                                                          'result' => 'Failure'
                                                                                                        }
                                                                                                },
                                                                                          'result' => 'Failure'
                                                                                        }
                                                                                },
                                                                          'result' => 'Failure'
                                                                        }
                                                                },
                                                          'result' => 'Failure'
                                                        }
                                                },
                                          'result' => 'Failure'
                                        }
                                },
                          'result' => 'Failure'
                        },
                  'culture-info' => {
                                    'current-culture' => 'en-US',
                                    'current-uiculture' => 'en-US'
                                  },
                  'errors' => '1',
                  'time' => '17:43:59',
                  'date' => '2015-01-12',
                  'not-run' => '4',
                  'name' => 'App.exe',
                  'ignored' => '4',
                  'total' => '573',
                  'skipped' => '0',
                  'environment' => {
                                   'user-domain' => 'domain',
                                   'nunit-version' => '2.6.3.13283',
                                   'os-version' => 'Microsoft Windows NT 6.2.9200.0',
                                   'cwd' => '',
                                   'user' => 'me',
                                   'platform' => 'Win32NT',
                                   'clr-version' => '4.0.30319.34014',
                                   'machine-name' => ''
                                 },
                  'inconclusive' => '2',
                  'invalid' => '0'
                }
        };

2 个答案:

答案 0 :(得分:4)

请勿使用XML::Simple。这是用词不当。它根本不简单,它适用于简单的XML。

  

不鼓励在新代码中使用此模块。

请尝试使用XML::Twig

您的问题的一部分就是 - 您有一个深层嵌套的XML结构。 “展示”的方式有限。

但是每个 XML解析器的作用是 - 将您的XML转换为perl数据结构 - 通常是一个哈希。但它通常会做的是让你结构重新打印成“正确的”XML。

因此,对于简单的重新格式化任务,XML :: Twig将允许您:

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

sub handle_tc {
    my ( $twig, $tc ) = @_;   
    foreach my $attr ( keys %{ $tc -> atts() } ) {
        print "$attr = ".$tc->att($attr)."\n";
    }
    print "\n"; 
}

my $twig_parser = XML::Twig->new(
    pretty_print  => 'indented',
    twig_handlers => { 'TC' => \&handle_tc },
)->parsefile('F:\mydump.log');


print "\n\nWhole XML pretty_print\n\n"; 
$twig_parser->print;

这将 - 当它去 - 打印'TS'元素的每个'name'属性。每次解析器遇到TS元素时,都会使用该XML子集调用该处理程序。

为了便于比较,$twig_parser -> print将根据'pretty_print'选项重新格式化并输出。 (但是考虑到你的源XML,可能不会改变它)。

答案 1 :(得分:1)

根据评论,如果您只想要TC节点,您可以解析XML文件并迭代节点,如果节点标记为TC,则提取/打印所需的信息。

或者,您可以在读取文件时使用正则表达式来捕获TC节点,然后提取所需的信息。

使用XML Parsers获得的是你所倾倒的东西,这是你期望得到的,所以我不确定你到底想要什么。更平坦的结构没有嵌套?