XSLT:使用多个“xml”语句解析文本文档

时间:2017-03-17 01:55:47

标签: xml xslt elementtree

(虽然它以一个包含多个<?xml ..>语句的文档开头,但仅通过声明这是“格式不正确的xml”来回答这个问题。请进一步阅读!)

仍在处理上一个问题中概述的同一项目 XSLT: choose template, variable length dt_assoc inside elem, building transform for DNS records format, 感谢@Tim C的良好建议,我将进入下一阶段。 这与解析由一系列xml“文档”组成的文本文件有关...也就是说,该文件的结构如下:

<?xml version='1.0' encoding='UTF-8'?>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" accountId="414660" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z">
    <ns2:nameservers>
        <ns2:nameserver name="dns1.stabletransit.com" />
        <ns2:nameserver name="dns2.stabletransit.com" />
    </ns2:nameservers>
    <ns2:recordsList totalEntries="5">
        <ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" />
        <ns2:record id="NS-3093587" type="NS" name="addressing.com" data="dns1.stabletransit.com" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="NS-3093589" type="NS" name="addressing.com" data="dns2.stabletransit.com" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="CNAME-6051671" type="CNAME" name="vh1.addressing.com" data="vh1.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
        <ns2:record id="CNAME-6051873" type="CNAME" name="www.addressing.com" data="virtual.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
    </ns2:recordsList>
</ns2:domain>
<?xml version='1.0' encoding='UTF-8'?>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" accountId="414660" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z">
    <ns2:nameservers>
        <ns2:nameserver name="dns1.stabletransit.com" />
        <ns2:nameserver name="dns2.stabletransit.com" />
    </ns2:nameservers>
    <ns2:recordsList totalEntries="5">
        <ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" />
        <ns2:record id="NS-3093587" type="NS" name="addressing.com" data="dns1.stabletransit.com" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="NS-3093589" type="NS" name="addressing.com" data="dns2.stabletransit.com" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="CNAME-6051671" type="CNAME" name="vh1.addressing.com" data="vh1.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
        <ns2:record id="CNAME-6051873" type="CNAME" name="www.addressing.com" data="virtual.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
    </ns2:recordsList>
</ns2:domain>

...等......

我试图找出管理这些单个块的最佳方法,现在必须将它们单独传递给我的XSLT转换,然后通过API POST传递到远程服务器进行处理(进入新的DNS区域记录) )...

我有点陷入困境,尝试使用ElementTree,想想如果我在整个事物中添加一个新的“根”,我可以从中创建一个树,并为每个ns2:domain元素进行处理

所以,在删除了除<?xml..>语句以外的所有语句之后,我尝试修改这样的源代码:

<?xml version='1.0' encoding='UTF-8'?>
<rackspace>
    <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
    ...
    </ns2:domain>
    <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
    ...
    </ns2:domain>
    <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
    ...
    </ns2:domain>
</rackspace>

但是,我对ElementTree完全不熟悉,似乎无法对“ns2:domain”子树进行任何处理,我想把它作为一个整体提取到变量中以传递给xslt变换。

#!/usr/bin/python2.7

import fileinput
import string
import re
import hashlib

from xml.etree import ElementTree as ET
from xml.etree.ElementTree import Element, SubElement, tostring

ns= {'ns2':'http://docs.rackspacecloud.com/dns/api/v1.0'}

my_outfile='/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackspaceDomains.out.txt'
my_infile='//Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/XSL_Rackspace_to_OpenSRS/saxon.test.xml'

'''FILE=open(my_infile,"r")
OUTFILE=open(my_outfile,"w")'''

print ("**** Start Reading from Input File ****")

with open(my_infile, 'rt') as f:

     tree = ET.parse(f)

root=tree.getroot()
# ET.dump(root)

domain=SubElement(root,"ns2:domain",ns)
#ET.dump(domain)
recordsList=SubElement(root,"ns2:recordsList",ns)

#parent_map = dict((c, p) for p in tree.getiterator() for c in p)
#print parent_map

for node in recordsList:
     for node in node:
          print node.tag, node.text
          for node in node:
               print node.tag, node.text

我毫不怀疑有简单明了的步骤来实现这一点,但我只是不懂语法!

所以,像这样的伪代码可能是

open my_rackspace_file.xml as rackfile
print "Start"
for each ns2:domain in rackfile:
   print "Processing ", ns2:domain/@name
   my_domain=getsubtree(ns2:domain)
   my_new_xml=`java saxon9he.jar net.sf.saxon.Transform -it < $my_domain` #Don't really know how this will work at the moment
   API_POST (my_new_xml)

print "Done"

非常感谢您对此的想法和建议! 深入潜水是很棒的,并且最终知道这一切都会有意义!

PF

顺便说一句,我使用的是Saxon XSLT 2.0,因为我需要正则表达式的功能......

0 个答案:

没有答案