Question

我有一个XML文件，如下所示：

<Organism>
 <Name>Bacillus halodurans C-125</Name>
  <Enzyme>M.BhaII</Enzyme>
   <Motif>GGCC</Motif>
  <Enzyme>M1.BhaI</Enzyme>
   <Motif>GCATC</Motif>
  <Enzyme>M2.BhaI</Enzyme>
   <Motif>GCATC</Motif>
</Organism>
<Organism>
 <Name>Bacteroides eggerthii 1_2_48FAA</Name>
</Organism>

我正在尝试将其写入CSV这样的文件：

Bacillus halodurans, GGCC
Bacillus halodurans, GCATC
Bacillus halodurans, GCATC
Bacteriodes,

我接近这个的方法是创建一个元组列表，它们将organism name和motif放在一起。我使用ElementTree模块尝试了这个：

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
rebase = tree.getroot()

list = []

for organisms in rebase.findall('Organism'):
        name = organisms.find('Name').text
        for each_organism in organisms.findall('Motif'):
            try:
                motif = organisms.find('Motif').text
                print name, motif
            except AttributeError:
                print name

然而我得到的输出看起来像这样：

Bacillus halodurans, GGCC
Bacillus halodurans, GGCC
Bacillus halodurans, GGCC

仅记录第一个motif。这是我第一次与ElementTree合作，所以它有点令人困惑。任何帮助将不胜感激。

写CSV文件时，我不需要帮助。

Answer 1

您需要解决的唯一问题是替换：

motif = organisms.find('Motif').text

使用：

motif = each_organism.text

您已经在遍历Motif内的Organism个节点。 each_organism循环变量的值为Motif。

我还会更改变量名称以避免混淆。另外，我认为在try/except标记的循环中不需要Motif。如果可能缺少name标记，您可以按照“请求宽恕，而不是许可”方法并捕获错误：

for organism in rebase.findall('Organism'):
    try:
        name = organism.find('Name').text
    except AttributeError:
        continue

    for motif in organism.findall('Motif'):
        motif = motif.text
        print name, motif

将XML文件转换为CSV

1 个答案: