Python:在xml中,如何在某些情况下删除节点

时间:2019-11-30 05:30:37

标签: python xml

我有一个XML文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
    <Review rid="1004293">
        <sentences>
            <sentence id="1004293:0">
                <text>Judging from previous posts this used to be a good place, but not any longer.</text>
                <Opinions>
            </sentence>
            <sentence id="1004293:1">
                <text>We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.</text>
                <Opinions>
            </sentence>
            <sentence id="1004293:2">
                <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
                <Opinions>
                    <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
                </Opinions>
            </sentence>
        </sentences>
    </Review>

如何删除没有意见的句子?留下那些对文本有意见的句子? 我想得到这样的东西:

<sentences>
        <sentence id="1004293:2">
            <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
            <Opinions>
                <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
            </Opinions>
        </sentence>
    </sentences>

3 个答案:

答案 0 :(得分:2)

我将使用此模块将xml转换为字典,例如:How to convert an xml string to a dictionary?,过滤掉不需要的节点,然后转换为xml。...

答案 1 :(得分:1)

请考虑使用XSLT(一种专用于转换XML文档的专用语言)。具体来说,先运行身份转换,然后根据需要在句子上运行一个空模板。

XSLT (另存为.xsl文件,特殊的.xml文件)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- EMPTY TEMPLATE TO DELETE NODE(S) -->
    <xsl:template match="sentence[text and not(Opinions/*)]"/>

</xsl:stylesheet>

Online Demo

Python (使用第三方模块lxml

import lxml.etree as et 

doc = et.parse('/path/to/Input.xml') 
xsl = et.parse('/path/to/Script.xsl') 

# CONFIGURE TRANSFORMER 
transform = et.XSLT(xsl) 

# TRANSFORM SOURCE DOC 
result = transform(doc) 

# OUTPUT TO CONSOLE 
print(result) 

# SAVE TO FILE 
with open('Output.xml', 'wb') as f: 
   f.write(result)

答案 2 :(得分:1)

使用内置的XML库(ElementTree)。

注意:您发布的XML无效,我必须对其进行修复。

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Reviews>
   <Review rid="1004293">
      <sentences>
         <sentence id="1004293:0">
            <text>Judging from previous posts this used to be a good place, but not any longer.</text>
            <Opinions />
         </sentence>
         <sentence id="1004293:1">
            <text>We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.</text>
            <Opinions />
         </sentence>
         <sentence id="1004293:2">
            <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
            <Opinions>
               <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0" />
            </Opinions>
         </sentence>
      </sentences>
   </Review>
</Reviews>
'''

root = ET.fromstring(xml)
sentences_root = root.find('.//sentences')
sentences_with_no_opinions = [s for s in root.findall('.//sentence') if not s.find('.//Opinions')]
for s in sentences_with_no_opinions:
    sentences_root.remove(s)


print(ET.tostring(root))

输出

<?xml version="1.0" encoding="UTF-8"?>
<Reviews>
   <Review rid="1004293">
      <sentences>
         <sentence id="1004293:2">
            <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
            <Opinions>
               <Opinion category="SERVICE#GENERAL" from="0" polarity="negative" target="NULL" to="0" />
            </Opinions>
         </sentence>
      </sentences>
   </Review>
</Reviews>