从XML文件

时间:2015-08-10 13:38:48

标签: perl

我是新的perl用户,我在文件中有以下行

<Ss ssId="76536062" handle="AFFY" batchId="52074"
    locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward" 
    strand="bottom" molType="genomic" buildId="130" methodClass="hybridize" 
    validated="by-submitter">
    <Sequence>
        <Seq5>TCACCTCTGGGACTGA</Seq5>
        <Observed>C/T</Observed>
        <Seq3>AATTAGGAAGAGCTGG</Seq3>
    </Sequence>
</Ss>
<Ss ssId="104807776" handle="KRIBB_YJKIM" batchId="60510"
    locSnpId="KHS1200112" subSnpClass="snp" orient="forward" strand="bottom"
    molType="genomic" buildId="130" methodClass="hybridize"
    validated="by-submitter">
    <Sequence>
        <Seq5>
             TAGGAACAAGGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAATAGGTTGACCTGACAATGGGTCACCTCTGGGACTGA
       </Seq5>
       <Observed>C/T</Observed>
       <Seq3>
          AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTT
       </Seq3>
   </Sequence>
</Ss>

我打印两者之间的内容是C / T。另外,我想为seq5和seq3打印30 bp。 提前致谢。有任何想法可以帮助我

1 个答案:

答案 0 :(得分:1)

这样的事情可以解决问题:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new()->parse( \*DATA );
foreach my $sequence ( $twig->get_xpath('//Sequence') ) {
    foreach my $element ( $sequence->children ) {
        print $element ->tag, " => ", $element->trimmed_text, "\n";
    }
}

__DATA__
<root>
<Ss ssId="76536062" handle="AFFY" batchId="52074"
    locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward" 
    strand="bottom" molType="genomic" buildId="130" methodClass="hybridize" 
    validated="by-submitter">
    <Sequence>
        <Seq5>TCACCTCTGGGACTGA</Seq5>
        <Observed>C/T</Observed>
        <Seq3>AATTAGGAAGAGCTGG</Seq3>
    </Sequence>
</Ss>
<Ss ssId="104807776" handle="KRIBB_YJKIM" batchId="60510"
    locSnpId="KHS1200112" subSnpClass="snp" orient="forward" strand="bottom"
    molType="genomic" buildId="130" methodClass="hybridize"
    validated="by-submitter">
    <Sequence>
        <Seq5>
             TAGGAACAAGGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAATAGGTTGACCTGACAATGGGTCACCTCTGGGACTGA
       </Seq5>
       <Observed>C/T</Observed>
       <Seq3>
          AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTT
       </Seq3>
   </Sequence>
</Ss>
</root>