python ElementTree用字符串查找元素

时间:2015-02-25 11:22:55

标签: python xml-parsing

我有大的xml文件(cca 2GB),下面只是这个文件的一部分。在这个文件中,我想搜索以下开头的所有元素:      <ManagedElement sourceType = "SIU"><ManagedElement sourceType = "CELLO">,然后从中获取一些数据。以下是这些ManagedElement的内容示例。 这是我的代码:

 #!/usr/bin/python 
from xml.etree import ElementTree as ElementTree

with open('/var/opt/arne/OSSRC_Model_2014_11_24_3_19_36_PM.xml', 'rt') as f:
    tree = ElementTree.parse(f)
with open('/home/gf', 'a') as f1:
    for item in tree.findall('.//ManagedElement'):
      rank = item.find('.//ipAddress')
      rank1=rank.get('string')
      name = item.find('.//ManagedElementId')
      name1=name.get('string')
      site = item.find('.//associatedSite')
      site1=site.get('string')
      f1.write((name1 + ',' + rank1 + ',' + site1 + '\n'))

问题是还有其他ManagedElements,我只想要字符串SIU和CELLO。

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Model SYSTEM "/opt/arne/etc/arne12_2.dtd">
<Model version = "1" importVersion = "12.2">
<!--Validate: /opt/arne/bin/import.sh -f 4_siu_create.xml \ -val:rall -->
    <Create>
        <SubNetwork userLabel = "ZLNOUR_SIU" networkType = "IPRAN">
            <ManagedElement sourceType = "SIU">
                <ManagedElementId string = "siu009722"/>
                <primaryType type = "STN"/>
                <managedElementType types = ""/>
                <associatedSite string = "Site=site00972"/>
                <nodeVersion string = "T11A"/>
                <platformVersion string = ""/>
                <swVersion string = ""/>
                <vendorName string = ""/>
                <userDefinedState string = ""/>
                <managedServiceAvailability int = "1"/>
                <isManaged boolean = "true"/>
                <connectionStatus string = "OFF"/>
                <Connectivity>
                    <DEFAULT>
                        <emUrl url = "http://10.131.203.117:80/"/>
                        <ipAddress string = "10.131.203.117"/>
                        <oldIpAddress string = "int dummy=0"/>
                        <hostname string = ""/>
                        <nodeSecurityState state = "ON"/>
                        <boardId string = ""/>
                        <Protocol number = "0">
                            <protocolType string = "SNMP"/>
                            <port int = "161"/>
                            <protocolVersion string = "v2c"/>
                            <securityName string = ""/>
                            <authenticationMethod string = ""/>
                            <encryptionMethod string = ""/>
                            <communityString string = "public"/>
                            <context string = ""/>
                            <namingUrl string = ""/>
                            <namingPort int = ""/>
                            <notificationIRPAgentVersion string = ""/>
                            <alarmIRPAgentVersion string = ""/>
                            <notificationIRPNamingContext context = ""/>
                            <alarmIRPNamingContext context = ""/>
                        </Protocol>
                        <Protocol number = "1">
                            <protocolType string = "SSH"/>
                            <port int = "22"/>
                            <protocolVersion string = ""/>
                            <securityName string = ""/>
                            <authenticationMethod string = ""/>
                            <encryptionMethod string = ""/>
                            <communityString string = ""/>
                            <context string = ""/>
                            <namingUrl string = ""/>
                            <namingPort int = ""/>
                            <notificationIRPAgentVersion string = ""/>
                            <alarmIRPAgentVersion string = ""/>
                            <notificationIRPNamingContext context = ""/>
                            <alarmIRPNamingContext context = ""/>
                        </Protocol>
                        <Browser>
                            <browser string = ""/>
                            <browserURL string = ""/>
                            <bookname string = ""/>
                        </Browser>
                    </DEFAULT>
                </Connectivity>
                <Tss>
                    <Entry>
                        <System string = "siu009722"/>
                        <Type string = "NORMAL"/>
                        <User string = "admin"/>
                        <Password string = "siu009722"/>
                    </Entry>
                    <Entry>
                        <System string = "siu009722"/>
                        <Type string = "SECURE"/>
                        <User string = "admin"/>
                        <Password string = "siu009722"/>
                    </Entry>
                </Tss>
                <Relationship>
                    <AssociableNode TO_FDN = "FtpServer=SMRSSLAVE-rtwaned1o,FtpService=swstore-rtwaned1o" AssociationType = "ManagedElement_to_ftpSwStore"/>
                    <AssociableNode TO_FDN = "FtpServer=SMRSSLAVE-rtwaned1o,FtpService=cmdown-rtwaned1o" AssociationType = "ManagedElement_to_neTransientCmDown"/>
                    <AssociableNode TO_FDN = "FtpServer=SMRSSLAVE-rtwaned1o,FtpService=cmup-rtwaned1o" AssociationType = "ManagedElement_to_neTransientCmUp"/>
                    <AssociableNode TO_FDN = "FtpServer=SMRSSLAVE-rtwaned1o,FtpService=pmup-rtwaned1o" AssociationType = "ManagedElement_to_neTransientPm"/>
                    <AssociableNode TO_FDN = "ManagementNode=ONRM" AssociationType = "MgmtAssociation"/>
                    <AssociableNode TO_FDN = "SubNetwork=ZLNOUR3,MeContext=rbs009721,ManagedElement=1,NodeBFunction=1" FROM_FDN = "SubNetwork=ZLNOUR_SIU,ManagedElement=siu009722,StnFunction=STN_ManagedFunction" AssociationType = "StnFunction_to_NodeBFunction"/>
                </Relationship>
            </ManagedElement>
        <ManagedElement sourceType="CELLO">
        <ManagedElementId string="3GMUP"/>
        <primaryType type="RBS"/>
        <managedElementType types=""/>
        <associatedSite string="Site=Podgorica"/>
        <nodeVersion string=""/>
        <platformVersion string=""/>
        <swVersion string=""/>
        <vendorName string="Ericsson"/>
        <userDefinedState string=""/>
        <managedServiceAvailability int="1"/>
        <isManaged boolean="true"/>
        <neMIMVersion string="T.1.201"/>
        <connectionStatus string="ON"/>
        <ManagedFunction>
           <functionType string="NodeB"/>
           <supportSystemControl boolean="false"/>
        </ManagedFunction>
        <Connectivity>
           <DEFAULT>
              <emUrl url="http://10.132.11.226:80/em/index.html"/>
              <ipAddress string="10.132.11.226"/>
              <oldIpAddress string=""/>
              <hostname string=""/>
              <nodeSecurityState state="ON"/>
              <boardId string=""/>
              <Protocol number="0">
                 <protocolType string="TELNET"/>
                 <port int="23"/>
                 <protocolVersion string=""/>
                 <securityName string=""/>
                 <authenticationMethod string=""/>
                 <encryptionMethod string=""/>
                 <communityString string=""/>
                 <context string=""/>
                 <namingUrl string=""/>
                 <namingPort int=""/>
                 <notificationIRPAgentVersion string=""/>
                 <alarmIRPAgentVersion string=""/>
                 <notificationIRPNamingContext context=""/>
                 <alarmIRPNamingContext context=""/>
              </Protocol>
              <Protocol number="1">
                 <protocolType string="CORBA"/>
                 <port int="0"/>
                 <protocolVersion string=""/>
                 <securityName string=""/>
                 <authenticationMethod string=""/>
                 <encryptionMethod string=""/>
                 <communityString string=""/>
                 <context string=""/>
                 <namingUrl string="http://10.132.11.226:80/cello/ior_files/nameroot.ior"/>
                 <namingPort int="0"/>
                 <notificationIRPAgentVersion string="3.2"/>
                 <alarmIRPAgentVersion string="3.2"/>
                 <notificationIRPNamingContext context="NOTIFICATION_IRP_VERSION_1_1"/>
                 <alarmIRPNamingContext context="ALARM_IRP_VERSION_1_1"/>
              </Protocol>
              <Protocol number="2">
                 <protocolType string="SFTP"/>
                 <port int="22"/>
                 <protocolVersion string=""/>
                 <securityName string=""/>
                 <authenticationMethod string=""/>
                 <encryptionMethod string=""/>
                 <communityString string=""/>
                 <context string=""/>
                 <namingUrl string=""/>
                 <namingPort int=""/>
                 <notificationIRPAgentVersion string=""/>
                 <alarmIRPAgentVersion string=""/>
                 <notificationIRPNamingContext context=""/>
                 <alarmIRPNamingContext context=""/>
              </Protocol>
              <Protocol number="3">
                 <protocolType string="FTP"/>
                 <port int="21"/>
                 <protocolVersion string=""/>
                 <securityName string=""/>
                 <authenticationMethod string=""/>
                 <encryptionMethod string=""/>
                 <communityString string=""/>
                 <context string=""/>
                 <namingUrl string=""/>
                 <namingPort int=""/>
                 <notificationIRPAgentVersion string=""/>
                 <alarmIRPAgentVersion string=""/>
                 <notificationIRPNamingContext context=""/>
                 <alarmIRPNamingContext context=""/>
              </Protocol>
              <Protocol number="4">
                 <protocolType string="SSH"/>
                 <port int="22"/>
                 <protocolVersion string=""/>
                 <securityName string=""/>
                 <authenticationMethod string=""/>
                 <encryptionMethod string=""/>
                 <communityString string=""/>
                 <context string=""/>
                 <namingUrl string=""/>
                 <namingPort int=""/>
                 <notificationIRPAgentVersion string=""/>
                 <alarmIRPAgentVersion string=""/>
                 <notificationIRPNamingContext context=""/>
                 <alarmIRPNamingContext context=""/>
              </Protocol>
              <Browser>
                 <browser string=""/>
                 <browserURL string=""/>
                 <bookname string=""/>
              </Browser>
           </DEFAULT>
        </Connectivity>
        <Tss>
           <Entry>
              <System string="3GMUP"/>
              <Type string="SECURE"/>
              <User string="rbs"/>
              <Password string="rbs"/>
           </Entry>
           <Entry>
              <System string="3GMUP"/>
              <Type string="NORMAL"/>
              <User string="rbs"/>
              <Password string="rbs"/>
           </Entry>
        </Tss>
        <Relationship>
           <AssociableNode TO_FDN="FtpServer=SMRSMASTERWRAN-oss1,FtpService=smo-backup-oss1" AssociationType="ManagedElement_to_ftpBackupStore"/>
           <AssociableNode TO_FDN="FtpServer=SMRSMASTERWRAN-oss1,FtpService=smo-keystore-oss1" AssociationType="ManagedElement_to_ftpLicenseKeyStore"/>
           <AssociableNode TO_FDN="FtpServer=SMRSMASTERWRAN-oss1,FtpService=smo-swstore-oss1" AssociationType="ManagedElement_to_ftpSwStore"/>
           <AssociableNode TO_FDN="Group=CD34082729" AssociationType="Group_to_MeContext"/>
           <AssociableNode TO_FDN="ManagementNode=ONRM" AssociationType="MgmtAssociation"/>
           <AssociableNode TO_FDN="SubNetwork=IPRAN,ManagedElement=MUP_BD,StnFunction=STN_ManagedFunction" FROM_FDN="SubNetwork=RNC111,MeContext=3GMUP,ManagedElement=1,NodeBFunction=1" AssociationType="StnFunction_to_NodeBFunction"/>
           <AssociableNode TO_FDN="SubNetwork=RNC111,Group=NodeB" AssociationType="Group_to_MeContext"/>
        </Relationship>
     </ManagedElement>
        </SubNetwork>
    </Create>
</Model>

1 个答案:

答案 0 :(得分:1)

仅获取ManagedElement属性等于sourceType或等于"CELLO"的{​​{1}}的正确XPath如下:

"SIU"

不幸的是,XPath .//ManagedElement[@sourceType="CELLO" or @sourceType="SIU"] 运算符似乎not supported by xml.etree.ElementTree,所以简单地将XPath传递给OR方法就不行了。现在,您可能的选择是使用具有更好XPath支持的lxml库(特别是如果您计划在代码中使用更广泛的复杂XPath),或者您可以创建lambda表达式来手动过滤findall()它的ManagedElement属性值。例如,而不仅仅是:

sourceType

......你可以尝试这样的事情:

tree.findall('.//ManagedElement')