使用lxml解析xml

时间:2018-05-16 18:21:26

标签: python-3.x lxml

我用lxml python解析xml文件3.将元素转换为dict列表的dict。但是代码不能正常工作并且变得奇怪,我尝试使用调试并且无法弄清楚问题是什么:

以下是我写的代码:

    tree = lxml.etree.parse(self.meetingXmlFile)
    root = tree.getroot()

    roomList = []
    for child in root.iter():
        # print("Tag is ::%s  and text is ::%s" % (child.tag ,  child.text))
        if child.tag  == "TowerName":
            roomList.clear()
            indexTower = child.text
            # print(indexTower)
        elif child.tag  == "BigMeetingRooms" :
            roomSize = "bigMeetingRoom"
        elif child.tag == "SmallMeetingRooms":
            roomSize = "smallMeetingRoom"
        elif child.tag  == "MeetingRoomName" :
            roomName = child.text
        elif child.tag == "MeetingRoomMailId" :
            roomMailId = child.text
            roomDict={roomName:roomMailId}
            roomList.append(roomDict)
            if roomSize == "bigMeetingRoom" :
                # print(indexTower, "  ", roomName, "  ", roomMailId)
                self.bigMeetingRoom[indexTower] = roomList
                print(indexTower, "  ", self.bigMeetingRoom[indexTower])
                print(self.bigMeetingRoom)

1 个答案:

答案 0 :(得分:1)

不要使用iter()并测试标记名称,请考虑使用xpath()代替...

Python 3.6

from lxml import etree

tree = etree.parse("input.xml")

roomList = {}
for tower in tree.xpath("/root/Towers/Tower"):
    large_rooms = []
    for lr in tower.xpath("MeetingRooms/BigMeetingRooms/MeetingRoom"):
        large_rooms.append({lr.xpath("MeetingRoomName")[0].text: 
                            lr.xpath("MeetingRoomMailId")[0].text})
    roomList[tower.xpath("TowerName")[0].text] = large_rooms

print(roomList)

XML输入(" input.xml")

<root>
    <Towers>
        <Tower>
            <TowerName>Tower 6</TowerName>
            <MeetingRooms>
                <BigMeetingRooms>
                    <MeetingRoom>
                        <MeetingRoomName>Colesseum</MeetingRoomName>
                        <MeetingRoomMailId>Colesseum_mail</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Saphire</MeetingRoomName>
                        <MeetingRoomMailId>Saphire_mail</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Dafodills</MeetingRoomName>
                        <MeetingRoomMailId>Dafodills_mail</MeetingRoomMailId>
                    </MeetingRoom>
                </BigMeetingRooms>
                <SmallMeetingRooms>
                    <MeetingRoom>
                        <MeetingRoomName>Senate House</MeetingRoomName>
                        <MeetingRoomMailId>SenateHouse_mail</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Forum</MeetingRoomName>
                        <MeetingRoomMailId>Forum_mail</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Pearl</MeetingRoomName>
                        <MeetingRoomMailId>Pearl_mail</MeetingRoomMailId>
                    </MeetingRoom>
                </SmallMeetingRooms>
            </MeetingRooms>
        </Tower>
        <Tower>
            <TowerName>Tower 7</TowerName>
            <MeetingRooms>
                <BigMeetingRooms>
                    <MeetingRoom>
                        <MeetingRoomName>Colesseum7</MeetingRoomName>
                        <MeetingRoomMailId>Colesseum_mail7</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Saphire7</MeetingRoomName>
                        <MeetingRoomMailId>Saphire_mail7</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Dafodills7</MeetingRoomName>
                        <MeetingRoomMailId>Dafodills_mail7</MeetingRoomMailId>
                    </MeetingRoom>
                </BigMeetingRooms>
                <SmallMeetingRooms>
                    <MeetingRoom>
                        <MeetingRoomName>Senate House7</MeetingRoomName>
                        <MeetingRoomMailId>SenateHouse_mail7</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Forum7</MeetingRoomName>
                        <MeetingRoomMailId>Forum_mail7</MeetingRoomMailId>
                    </MeetingRoom>
                    <MeetingRoom>
                        <MeetingRoomName>Pearl7</MeetingRoomName>
                        <MeetingRoomMailId>Pearl_mail7</MeetingRoomMailId>
                    </MeetingRoom>
                </SmallMeetingRooms>
            </MeetingRooms>
        </Tower>
    </Towers>
</root>

打印输出

{'Tower 6': [{'Colesseum': 'Colesseum_mail'}, {'Saphire': 'Saphire_mail'}, {'Dafodills': 'Dafodills_mail'}], 'Tower 7': [{'Colesseum7': 'Colesseum_mail7'}, {'Saphire7': 'Saphire_mail7'}, {'Dafodills7': 'Dafodills_mail7'}]}