元素树查找输出空文本

时间:2016-10-15 02:47:00

标签: python xml elementtree

使用元素树提取文本时遇到问题。

我的xml文件格式为

<elecs id = 'elecs'>
    <elec id = "CLM-0001" num = "0001">
        <elec-text> blah blah blah </elec-text>
        <elec-text> blah blah blah </elec-text>
    </elec>
    <elec id = "CLM-0002" num = "0002">
         <elec-text> blah blah blah </elec-text>
         <elec-text> blah blah blah </elec-text>
    </elec>
 </elecs>

我想提取标签内的所有文字

假设我们的xml文件位于变量xml

import xml.etree.ElementTree as ET
import lxml import etree
parser = etree.XMLParser(recover = True)
contents = open(xml).read()
tree = ET.fromstring(contents, parser = parser)
elecsN = tree.find('elecs')
for element in elecsN:
    print element.text

问题是,上面的代码返回空字符串。我已经在我的文档中尝试了上面的代码和其他标签,但它确实有效。我不知道为什么它这次返回空字符串。

无论如何我能解决这个问题。

非常感谢

2 个答案:

答案 0 :(得分:1)

您可以在名称中找到直接包含文字的元素,例如elec-text

>>> elec_texts = tree.findall('.//elec-text')  
>>> for elec_text in elec_texts:                            
...     print elec_text.text                              
...                                               
 blah blah blah                                   
 blah blah blah                                   
 blah blah blah                                   
 blah blah blah        

答案 1 :(得分:0)

如果你的意思是“任何方式”,你可以使用lxml。

>>> from io import StringIO
>>> html = StringIO('''\
... <elecs id = 'elecs'>
...     <elec id = "CLM-0001" num = "0001">
...             <elec-text> blah blah blah </elec-text>
...             <elec-text> blah blah blah </elec-text>            
...     </elec>
...     <elec id = "CLM-0002" num = "0002">    
...          <elec-text> blah blah blah </elec-text>
...          <elec-text> blah blah blah </elec-text>         
...     </elec>
... </elecs>
... '''
... )
>>> from lxml import etree
>>> doc = etree.parse(html)
>>> doc.xpath('//elecs/elec/*/text()')
[' blah blah blah ', ' blah blah blah ', ' blah blah blah ', ' blah blah blah ']