找到xls标签和他的孩子连蟒蛇

时间:2017-04-05 14:52:23

标签: python xml xslt xpath elementtree

我有一些麻烦要在xls代码文件中找到一个特定的标签并与他的孩子一起使用。

例如:

    <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:crossFunction="http://bla.bla.bla/" xmlns:simple-date-format="xalan://java.text.SimpleDateFormat" xmlns:srv="bla.bla.bla1" xmlns:xdt="http://www.w3.org/2005/02/xpath-datatypes" xmlns:date="http://exslt.org/dates-and-times" xmlns:customCoreFunction="http://bla.bla.bla2" xmlns:xalan="http://xml.apache.org/xalan" xmlns:productCoreFunction="http://bla.bla.bla" xmlns:srvesb0="http://esb.original.com.br/HistoricoComentario" xmlns:exsl="http://exslt.org/common" version="1.0" exclude-result-prefixes="xbla.bla.bla"> 
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>  
      <xsl:variable name="uriTokenSeparator" select="';'"/>  
      <xsl:variable name="uriKeyValueSeparator" select="'='"/>  
      <xsl:template match="/"> 
        <xsl:variable name="messageContext" select="."/>  
        <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">  
          <soapenv:Header></soapenv:Header>  
          <soapenv:Body> 
            <xsl:element name="srvesb0:getHistory"> 
              <xsl:if test="((/soap:Envelope/soap:Body/*/note[@name='note']/catchId) and ((/soap:Envelope/soap:Body/*/note[@name='note']/catchId!='') or (/soap:Envelope/soap:Body/*/note[@name='note']/catchId/@*)))"> 
                <xsl:element name="srvesb0:idCapture"> 
                  <xsl:value-of select="/soap:Envelope/soap:Body/*/note[@name='note']/catchId"/> 
                </xsl:element> 
              </xsl:if> 
            </xsl:element> 
          </soapenv:Body> 
        </soapenv:Envelope> 
      </xsl:template> 
    </xsl:stylesheet>

我只需要获取&#39; Body&#39;中的代码。标记:

<soapenv:Body> 
            <xsl:element name="srvesb0:getHistory"> 
              <xsl:if test="((/soap:Envelope/soap:Body/*/note[@name='note']/catchId) and ((/soap:Envelope/soap:Body/*/note[@name='note']/catchId!='') or (/soap:Envelope/soap:Body/*/note[@name='note']/catchId/@*)))"> 
                <xsl:element name="srvesb0:idCapture"> 
                  <xsl:value-of select="/soap:Envelope/soap:Body/*/note[@name='note']/catchId"/> 
                </xsl:element> 
              </xsl:if> 
            </xsl:element> 
          </soapenv:Body>

然后,逐个元素地迭代并获得它的属性。

但我使用的任何搜索代码都有效 .xpath .iter .find

如果我迭代.getroot(),结果会出现:

    import lxml.etree as XT


xslt = XT.parse('transformation.xsl')
rootxslt = xslt.getroot()

for child in rootxslt:
    child.tag = child.tag.split('}', 1)[1]  # strip all namespaces
    print child.tag, child.attrib, child.text
    for child2 in child:
        child2.tag = child2.tag.split('}', 1)[1]  # strip all namespaces
        print child2.tag, child2.attrib, child2.text
        for child3 in child2:
            child3.tag = child3.tag.split('}', 1)[1]  # strip all namespaces
            print child3.tag, child3.text, child3.text
            for child4 in child3:
                child4.tag = child4.tag.split('}', 1)[1]  # strip all namespaces
                print child4.tag, child4.text, child4.text
                for child5 in child4:
                    child5.tag = child5.tag.split('}', 1)[1]  # strip all namespaces
                    print child5.tag, child5.text, child5.text

但是如果尝试迭代特定标记,则会显示任何结果:

    import lxml.etree as XT


xslt = XT.parse('transformation.xsl')
rootxslt = xslt.getroot()

for child in rootxslt.findall("Body"):
    child.tag = child.tag.split('}', 1)[1]  # strip all namespaces
    print child.tag, child.attrib, child.text
    for child2 in child:
        child2.tag = child2.tag.split('}', 1)[1]  # strip all namespaces
        print child2.tag, child2.attrib, child2.text
        for child3 in child2:
            child3.tag = child3.tag.split('}', 1)[1]  # strip all namespaces
            print child3.tag, child3.text, child3.text
            for child4 in child3:
                child4.tag = child4.tag.split('}', 1)[1]  # strip all namespaces
                print child4.tag, child4.text, child4.text
                for child5 in child4:
                    child5.tag = child5.tag.split('}', 1)[1]  # strip all namespaces
                    print child5.tag, child5.text, child5.text

有人知道如何从“身体”中获取树木。标记

感谢

1 个答案:

答案 0 :(得分:0)

您可以使用xpath方法执行此操作。

>>> import lxml.etree as XT
>>> xslt = XT.parse('c:/scratch/sample.xml')
>>> xslt.xpath('.//soapenv:Body', namespaces={'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/'})
[<Element {http://schemas.xmlsoap.org/soap/envelope/}Body at 0x5e60c88>]
>>> theBody[0]
<Element {http://schemas.xmlsoap.org/soap/envelope/}Body at 0x5e60c88>
>>> list(theBody[0].iterdescendants())
[<Element {http://www.w3.org/1999/XSL/Transform}element at 0x5e60b08>, <Element {http://www.w3.org/1999/XSL/Transform}if at 0x5e7ea08>, <Element {http://www.w3.org/1999/XSL/Transform}element at 0x5e7ea48>, <Element {http://www.w3.org/1999/XSL/Transform}value-of at 0x5e7ea88>]

找到所需代码的容器后,如此处所示,您可以遍历容器的后代。

编辑:另一种方法。

假设:(1)一个命名空间适用于各种xsl文档,如下面的代码所示。 (2)xsl:variable messageContext将紧接在您想要的任何内容之前。

然后从找到该内容之前的messageContext元素开始。现在找到旁边的元素。最后,遍历消息内容的body的后代。或者不管你想做什么。

>>> import lxml.etree as XT
>>> xslt = XT.parse('sample.xml')
>>> sibling = xslt.xpath('.//xsl:variable[@name="messageContext"]', namespaces={'xsl': "http://www.w3.org/1999/XSL/Transform"})
<Element {http://schemas.xmlsoap.org/soap/envelope/}Envelope at 0x11912c8>
>>> envelope = sibling[0].getnext()
>>> list(envelope.iter())
[<Element {http://schemas.xmlsoap.org/soap/envelope/}Envelope at 0x11912c8>, <Element {http://schemas.xmlsoap.org/soap/envelope/}Header at 0x1191448>, <Element {http://schemas.xmlsoap.org/soap/envelope/}Body at 0x1191248>, <Element {http://www.w3.org/1999/XSL/Transform}element at 0x11910c8>, <Element {http://www.w3.org/1999/XSL/Transform}if at 0x1191488>, <Element {http://www.w3.org/1999/XSL/Transform}element at 0x11914c8>, <Element {http://www.w3.org/1999/XSL/Transform}value-of at 0x1191508>]

编辑2:另一个,使用BeautifulSoup。如果您使用'xml'作为BeautifulSoup的第二个参数,那么您可以轻松地解析命名空间元素,就像我刚从https://stackoverflow.com/a/35564127/131187学到的那样。

>>> import bs4
>>> soup = bs4.BeautifulSoup(open('sample.xml').read(), 'xml')
>>> body = soup.find_all('Body')
>>> body
[<Body>
<xsl:element name="srvesb0:getHistory">
<xsl:if test="((/soap:Envelope/soap:Body/*/note[@name='note']/catchId) and ((/soap:Envelope/soap:Body/*/note[@name='note']/catchId!='') or (/soap:Envelope/soap:Body/*/note[@name='note']/catchId/@*)))">
<xsl:element name="srvesb0:idCapture">
<xsl:value-of select="/soap:Envelope/soap:Body/*/note[@name='note']/catchId"/>
</xsl:element>
</xsl:if>
</xsl:element>
</Body>]
>>> body[0].find('value-of')
<xsl:value-of select="/soap:Envelope/soap:Body/*/note[@name='note']/catchId"/>