试图用lxml解析时的Segfaulting

时间:2012-11-12 05:30:27

标签: xml xml-parsing lxml

我有这个简单的python脚本,如下所示:

import sys 
from lxml import etree

tree = etree.parse('gdpdefl.xml')

但是它是段错误的,所以,经过一些谷歌搜索,我认为我的xml文档可能会被破坏,所以我尝试了这个:

import sys
from lxml import etree

parser = etree.HTMLParser()
tree = etree.parse('gdpdefl.xml', parser)

这也是segfaulted。这是我尝试解析的xml文档的示例:

<?xml version="1.0" encoding="utf-8"?> <Root xmlns:wb="http://www.worldbank.org">   <data>
    <record>
      <field name="Country or Area" key="ARB">Arab World</field>
      <field name="Item" key="NY.GDP.DEFL.KD.ZG">Inflation, GDP deflator (annual %)</field>
      <field name="Year">1960</field>
      <field name="Value" />
    </record>
    <record>
      <field name="Country or Area" key="ARB">Arab World</field>
      <field name="Item" key="NY.GDP.DEFL.KD.ZG">Inflation, GDP deflator (annual %)</field>
      <field name="Year">1961</field>
      <field name="Value" />
    </record> 
    <record>
      <field name="Country or Area" key="ZWE">Zimbabwe</field>
      <field name="Item" key="NY.GDP.DEFL.KD.ZG">Inflation, GDP deflator (annual %)</field>
      <field name="Year">2011</field>
      <field name="Value">21.1562931758805</field>
    </record>
  </data>
</Root>

如果我在这里确实形成了错误的xml,那么将所有Country,Item,Year和Values字符串从这个文件中删除到列表中的最佳方法是什么?

0 个答案:

没有答案