从xml标签Python中检索数据

时间:2015-06-30 01:56:04

标签: python xml

我正在尝试使用以下代码在type =“slidenum”时检索'a:t'标签之间的幻灯片编号,但某些内容无效。我应该得到1。

这是XML:

<a:p><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum">
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/>
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill>
</a:endParaRPr></a:p></p:txBody></p:sp>

这是我的代码

    z = zipfile.ZipFile(pptx_filename)
    for name in z.namelist():
      m = re.match(r'ppt/notesSlides/notesSlide\d+\.xml', name)
    if m is not None:
        f = z.open(name)
        tree = ET.parse(f)
        f.close()
        root = tree.getroot()
        # Find the slide number.
        slide_num = None
        for fld in root.findall('/'.join(['.', '', p.txBody, a.p, a.fld])):
            if fld.get('type', '') == 'slidenum':
                slide_num = int(fld.find(a.t).text)
                print slide_num

2 个答案:

答案 0 :(得分:0)

我会在解析之前从xml中删除命名空间标记。然后使用XPATH fld[@type='slidenum']/t查找fld类型fld[@type='slidenum']/t和子节点t的所有节点。这是一个示例,说明这可能如何工作:

from lxml import etree

xml = """
<a:p><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum">
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/>
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill>
</a:endParaRPr></a:p>
"""

tree = etree.fromstring(xml.replace('a:',''))
slidenum = tree.find("fld[@type='slidenum']/t").text
print(slidenum)
1

答案 1 :(得分:0)

从Moxymoo的回答中修改后使用命名空间而不是删除它们:

android

以下是使用OP提供的命名空间的外部文件的相同示例:

# cElementTree is the faster, C language based big brother of ElementTree
from xml.etree import cElementTree as etree

# Our test XML
xml = '''
<a:p xmlns:a="http://example.com"><a:fld id="{55FBEE69-CA5C-45C8-BA74-481781281731}" type="slidenum">
<a:rPr lang="en-US" sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/>
</a:solidFill></a:rPr><a:pPr/><a:t>1</a:t></a:fld><a:endParaRPr lang="en-US" 
sz="1300" i="0"><a:solidFill><a:srgbClr val="000000"/></a:solidFill>
</a:endParaRPr></a:p>
'''

# Manually specify the namespace. The prefix letter ("a") is arbitrary.
namespaces = {"a":"http://example.com"}

# Parse the XML string
tree = etree.fromstring(xml)

"""
Breaking down the search expression below
  a:fld - Find the fld element prefixed with namespace identifier a:
  [@type='slidenum'] - Match on an attribute type with a value of 'slidenum'
  /a:t - Find the child element t prefixed with namespace identifier a:
"""
slidenums = tree.findall("a:fld[@type='slidenum']/a:t", namespaces)
for slidenum in slidenums:
    print(slidenum.text)