这是来自ENA的xml文件的一部分,包括几个ROOT
<?xml version="1.0" encoding="UTF-8"?>
<ROOT request="Taxon:5671&display=xml">
<taxon scientificName="Leishmania infantum" taxId="5671" parentTaxId="38574" rank="species" hidden="true" taxonomicDivision="INV" geneticCode="1" mitochondrialGeneticCode="4" plastIdGeneticCode="11">
<lineage>
<taxon scientificName="Leishmania donovani species complex" taxId="38574" rank="species group" hidden="true"></taxon>
<taxon scientificName="Leishmania" taxId="38568" rank="subgenus" hidden="true"></taxon>
<taxon scientificName="Leishmania" taxId="5658" rank="genus" hidden="false"></taxon>
<taxon scientificName="Leishmaniinae" taxId="1286322" rank="subfamily" hidden="false"></taxon>
<taxon scientificName="Trypanosomatidae" taxId="5654" rank="family" hidden="false"></taxon>
<taxon scientificName="Kinetoplastida" commonName="kinetoplasts" taxId="5653" rank="order" hidden="false"></taxon>
<taxon scientificName="Euglenozoa" taxId="33682" hidden="false"></taxon>
<taxon scientificName="Eukaryota" commonName="eucaryotes" taxId="2759" rank="superkingdom" hidden="false"></taxon>
<taxon scientificName="cellular organisms" taxId="131567" hidden="true"></taxon>
<taxon scientificName="root" taxId="1" hidden="true"></taxon>
</lineage>
<children>
<taxon scientificName="Leishmania infantum JPCM5" taxId="435258"> </taxon>
</children>
<synonym type="synonym" name="Leishmania (Leishmania) infantum"></synonym>
<synonym type="synonym" name="Leishmania donovani infantum"></synonym>
</taxon>
</ROOT>
我在python 2.6中解析它如下:
import xml.etree.ElementTree as ET
tree = ET.parse('parsing_ena.xml')
我可以使用
获取与第一个孩子相关的所有分类名称root = tree.getroot()
taxa = root.findall("./ROOT/taxon")
first_taxa = [x.attrib["scientificName"] for x in taxa[1].findall("./lineage/taxon")]
如何在xml文件中迭代所有子项?