Question

我正在尝试解析XML文档中的某些标记，并且它正在退出AttributeError: '_ElementStringResult' object has no attribute 'text'错误。

这是xml文档：

<?xml version='1.0' encoding='ASCII'?>
<Root>
  <Data>
    <FormType>Log</FormType>
    <Submitted>2012-03-19 07:34:07</Submitted>
    <ID>1234</ID>
    <LAST>SJTK4</LAST>
    <Latitude>36.7027777778</Latitude>
    <Longitude>-108.046111111</Longitude>
    <Speed>0.0</Speed>
  </Data>
</Root>

以下是我正在使用的代码

from lxml import etree
from StringIO import StringIO
import MySQLdb
import glob
import os
import shutil
import logging
import sys

localPath = "C:\data"
xmlFiles = glob.glob1(localPath,"*.xml")
for file in xmlFiles:
    a = os.path.join(localPath,file)
    element = etree.parse(a)

    Data = element.xpath('//Root/Data/node()')
    parsedData = [{field.tag: field.text for field in Data} for action in Data]




print parsedData #AttributeError: '_ElementStringResult' object has no attribute 'text'

Answer 1

'//Root/Data/node()'将返回所有子元素的列表，其中包含文本元素作为不具有text属性的字符串。如果您在Data = ...之后立即打印，则会看到类似['\n ', <Element FormType at 0x10675fdc0>, '\n ', ...的内容。

我会首先进行过滤，例如：

Data = [f for f in elem.xpath('//Root/Data/node()') if hasattr(f, 'text')]

然后我认为以下行可以改写为：

parsedData = {field.tag: field.text for field in Data}

将给出我认为你想要的元素标签和文本字典。

Answer 2

如果只想返回元素（而不是文本节点），则不要查询//Root/Data/node()，而是查询/Root/Data/*。（此外，仅使用一个前导/而不是//可以让引擎进行更便宜的搜索，而不需要查看整个子树以获得额外的Root。

另外 - 你确定真的想要遍历内部循环中的数据子元素的整个列表，而不是只循环外部选择的单个数据元素的子元素环？我认为您的逻辑已被破坏，但只有在Data下有一个包含多个Root元素的文件时才会显示。

Python lxml：查询节点时返回没有.text属性的项目（）

2 个答案: