处理子项后更新XML中的根

时间:2016-08-26 15:26:18

标签: python xml python-3.x

this SO question相关我设法完成了以下代码段的问题:

import xml.etree.ElementTree as ET


def read_xml():
    with open('test.xml') as xml_file:
        return xml_file.read()


xml_file = read_xml()

root = ET.fromstring(xml_file)
pmt_infs = root.find('.//CstmrCdtTrfInitn').findall('PmtInf')
print(pmt_infs)

nodes = []
for node in pmt_infs:
    children = list(node)
    nodes.append(children)

xml_stuff = [None] * len(nodes)
to_remove = []

for first, *column in zip(*nodes):
    for index, item in enumerate(column, 1):
        if 'CdtTrfTxInf' in item.tag:
            xml_stuff[index] = item
            continue

        if first.tag == item.tag and first.text == item.text and index not in to_remove:
            to_remove.append(index)

for index in to_remove:
    pmt_infs[0].append(xml_stuff[index])
for index in to_remove[::-1]:
    pmt_infs.pop(index)

print(pmt_infs)

现在,上面这段代码完全符合我在上一个问题中的要求:

  

我想将整个<CdtTrfTxInf></CdtTrfTxInf>移到。{   首先<PmtInf></PmtInf>并删除整个<PmtInf></PmtInf>   我已经从<CdtTrfTxInf></CdtTrfTxInf>开始了。

上面已经完成了,但我遇到了一个小问题。最初,我从文件中获取root。现在,我想用新数据更新它。问题是我不知道如何在新文件中添加XML的第一部分,然后将pmt_infs附加到它:

<?xml version="1.0" encoding="utf-8" ?>
<Document>
    <CstmrCdtTrfInitn>
        <GrpHdr>
            <other_tags>a</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>-->
            <other_tags>b</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>-->
            <other_tags>c</other_tags> <!--here there might be other nested tags inside <other_tags></other_tags>-->
        </GrpHdr>
        <!-- here should be the <PmtInf> that's been processed above -->
    </CstmrCdtTrfInitn>
</Document> 

有人可以给我一些提示吗?

LE:根据要求,我会在此处添加所需的结果:

<?xml version="1.0" encoding="utf-8" ?>
<Document>
    <CstmrCdtTrfInitn>
        <GrpHdr>
            <other_tags>a</other_tags> 
            <other_tags>b</other_tags>
            <other_tags>c</other_tags> 
        </GrpHdr>

        <PmtInf>
            <things>d</things> 
            <things>e</things> 

            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>

        <PmtInf>
            <things>f</things> 
            <things>g</things> 

            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>
    </CstmrCdtTrfInitn>
</Document>

现在输出看起来像是因为:

  • 查看<PmtInf></PmtInf>部分(三部分),我们可以看到:
    1. 如果我们比较第一个<things>中的<pmtinf>和第二个<pmtinf>中的内容,我们可以看到它们不相同(d != fe != g)所以我们继续下一个<pmtinf>;如果我们将第一个<pmtinf> <things>与第三个<pmtinf>进行比较,它们也是相同的,因此我们保留第一个pmtinf
    2. 我们转到第二个things部分,并将things与第pmtinf段中的CdtTrfTxInf进行比较(它们是相同的)。也就是说,我们从第三个pmtinf获取pmtinf部分,最后将其添加到第二个pmtinf并完全删除第三个[[a1, b1, c1], [a2, b2, c2], [a3, b3, c3]]

想象一下这是一个列表清单(事实上,它们是什么):

<things>

其中的: a =来自<PmtInf>的第一个<things>代码 b =来自<PmtInf>的第二个< CdtTrfTxInf>代码 c = <PmtInf>

中的a1!=a2代码

在我的例子中:

b1!=b2[[a1, b1, c1, c2],[a3, b3, c3]] =&gt;我们可以移动到下一个子列表(如果它们相同,列表将如下所示:

a1!=a3

b1!=b3a2==a3 =&gt;我们可以转到第二个子列表并将其与之后的所有子列表进行比较

b2==b3[[a1, b1, c1], [a2, b2, c2, c3]] =&gt;他们是一样的,所以我们现在有:

<PmtInf>
    <things>d</things>
    <things>e</things>

    <CdtTrfTxInf>
        <!-- other nested tags here -->
    </CdtTrfTxInf>
</PmtInf>

<PmtInf>
    <things>f</things>
    <things>g</things>

    <CdtTrfTxInf>
        <!-- other nested tags here -->
    </CdtTrfTxInf>
    <CdtTrfTxInf>
        <!-- other nested tags here -->
    </CdtTrfTxInf>
</PmtInf>

实际上,我的结果只会是:

<?xml version="1.0" encoding="utf-8" ?>
<Document>
    <CstmrCdtTrfInitn>
        <GrpHdr>
            <other_tags>a</other_tags> 
            <other_tags>b</other_tags>
            <other_tags>c</other_tags> 
        </GrpHdr>

        <PmtInf>
            <things>d</things> 
            <things>e</things> 

            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>

        <PmtInf>
            <things>f</things> 
            <things>g</things> 

            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>
    </CstmrCdtTrfInitn>
</Document>

但我需要它:

public BooksAdapter(Context context, int resource, List<Books> book_list )
{
    super( context, resource, book_list );
    mResource = resource;
    mInflater = (LayoutInflater) context.getSystemService( Context.LAYOUT_INFLATER_SERVICE );
}

@Override
 public View getView(int position, View convertView, ViewGroup parent ){
    View view = convertView == null ? mInflater.inflate( mResource, parent, false ) : convertView;    
    TextView booksNameView = (TextView) view.findViewById( R.id.bookName );
    TextView booksReleaseView = (TextView) view.findViewById( R.id.bookRelease );
    TextView booksAuthorView = (TextView) view.findViewById( R.id.bookAuthor );
    Books item = getItem( position );
    booksNameView.setText( item.getName() );
    objectIdView.setText( item.getObjectId() );
    booksReleaseView.setText( item.getRelease() );
    booksAuthorView.setText( item.getAuthor() );
    return view;
}

1 个答案:

答案 0 :(得分:1)

考虑XSLT,这是用于处理XML文档的转换语言。具体来说,您的重新排序实际上需要Muenchian Method,一个1.0过程来使用某个键索引XML文档并相应地分组子数据(在2.0中,可以使用更简单的<xsl:for-each-group>)。这里使用的密钥是<things><PmtInf>节点的串联。

Python的第三方模块lxml可以使用libxslt处理器运行XSLT 1.0脚本。当然,Python也可以调用像Saxon and Xalan这样的外部处理器,这些处理器可以运行2.0,甚至更新的3.0脚本。在此解决方案中,不需要for循环或if逻辑。此外,在文档内容上创建哈希表时,使用<xsl:key>会更有效。

输入XML

<?xml version="1.0" encoding="utf-8" ?>
<Document>
    <CstmrCdtTrfInitn>
        <GrpHdr>
            <other_tags>a</other_tags>
            <other_tags>b</other_tags>
            <other_tags>c</other_tags>
        </GrpHdr>

        <PmtInf>
            <things>d</things>
            <things>e</things>
            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>

        <PmtInf>
            <things>f</things> 
            <things>g</things> 
            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>

        <PmtInf>
            <things>f</things> 
            <things>g</things> 
            <CdtTrfTxInf>
                <!-- other nested tags here -->
            </CdtTrfTxInf>
        </PmtInf>
    </CstmrCdtTrfInitn>
</Document>    

XSLT 脚本(另存为单独的.xsl或.xslt文件;调整键@use及其后来对实际的引用)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output version="1.0" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*"/>

  <xsl:key name="pkey" match="PmtInf" use="concat(things[1], things[2])" />

  <xsl:template match="/Document">
    <xsl:copy>
      <xsl:apply-templates select="CstmrCdtTrfInitn"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="CstmrCdtTrfInitn"> 
   <xsl:copy>
    <xsl:copy-of select="GrpHdr"/>

    <xsl:for-each select="PmtInf[count(. | key('pkey', concat(things[1], things[2])))]">
      <xsl:copy>
        <xsl:copy-of select="things"/>
        <xsl:for-each select="key('pkey', concat(things[1], things[2]))">      
           <xsl:copy-of select="CdtTrfTxInf"/>       
        </xsl:for-each>
      </xsl:copy>
    </xsl:for-each>

   </xsl:copy>
  </xsl:template>    
</xsl:transform>

Python 脚本

import lxml.etree as ET

# LOAD XML AND XSL SOURCES
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')

# TRANSFORM SOURCE DOCUMENT
transform = ET.XSLT(xslt)
newdom = transform(dom)

# OUTPUT TO FILE   
xmlfile = open('Output.xml', 'wb')
xmlfile.write(newdom)
xmlfile.close()

输出 XML

<?xml version='1.0' encoding='UTF-8'?>
<Document>
  <CstmrCdtTrfInitn>
    <GrpHdr>
      <other_tags>a</other_tags>
      <other_tags>b</other_tags>
      <other_tags>c</other_tags>
    </GrpHdr>
    <PmtInf>
      <things>d</things>
      <things>e</things>
      <CdtTrfTxInf>
        <!-- other nested tags here -->
      </CdtTrfTxInf>
    </PmtInf>
    <PmtInf>
      <things>f</things>
      <things>g</things>
      <CdtTrfTxInf>
        <!-- other nested tags here -->
      </CdtTrfTxInf>
      <CdtTrfTxInf>
        <!-- other nested tags here -->
      </CdtTrfTxInf>
    </PmtInf>
  </CstmrCdtTrfInitn>
</Document>