使用minidom python从父标记xml打印值

时间:2016-04-14 09:31:07

标签: python xml minidom

我有一个非常大的xml文件,如果某个标签超过2,我需要知道ID值。 xml文件是这样的:

capabilities = DesiredCapabilities.firefox();
capabilities.setCapability("applicationName", "qa-user");
webDriver = new RemoteWebDriver(new URL("http://10.0.50.34:4444/wd/hub"), capabilities);

如果联系人中的FIELDS超过2,我将在每个Calendar标签的ID内打印文本,所以我写了这段代码:

<Users>
    <Calendar ID="text1">
        <Folders>...</Folders>
        <FolderRights/>
        <Event/>
        <EventReminder/>
        <EventContact/>
        <EventRecurrence/>
        <EventException/>
        <ContactItem>
            <COLUMNS>...</COLUMNS>
            <FIELDS>...</FIELDS>
            <FIELDS>...</FIELDS>
            <FIELDS>...</FIELDS>
            <FIELDS>...</FIELDS>
        </ContactItem>
        <ContactLocation>...</ContactLocation>
        <Tags/>
        <TagLinks/>
        <ItemAttr/>
        <ItemAttrData/>
    </Calendar>
    <Calendar ID="text2">
        <Folders>...</Folders>
        <FolderRights/>
        <Event/>
        <EventReminder/>
        <EventContact/>
        <EventRecurrence/>
        <EventException/>
        <ContactItem/>
        <ContactLocation/>
        <Tags/>
        <TagLinks/>
        <ItemAttr/>
        <ItemAttrData/>
    </Calendar>
</Users>

但我没有ID值。 我怎样才能做到这一点?非常感谢

2 个答案:

答案 0 :(得分:1)

假设您获得了正确的标记元素,这是访问ID属性的方法:

for contatti in dom.getElementsByTagName('Users'):
    calendars = contatti.getElementsByTagName('Calendar')
    for calendar in calendars:
         attribute = calendar.attributes.get("ID")
         print attribute.name
         print attribute.value

答案 1 :(得分:1)

使用lxml非常简单,找到具有&gt;的日历父标签2 contactitem //使用count标记字段:

from lxml.html import fromstring

tree = fromstring(the_xml)

print(tree.xpath("//calendar[count(./contactitem//fields) > 2]/@id"))

示例运行:

In [8]: from lxml.html import fromstring

In [9]: tree = fromstring(h)

In [10]: tree.xpath("//calendar[count(./contactitem//fields) > 2]/@id"
   ....: )
Out[10]: ['text1']

或使用lxml.etree:

from lxml.etree import fromstring

tree = fromstring(h)

print(tree.xpath("//Calendar[count(./ContactItem//FIELDS) > 2]/@ID"))

要从文件中读取,请使用 parse

from lxml.html import parse
tree = parse("your.xml")

您通常应该从文件中读取并让lxml处理编码。

xml.etree不支持

count ,所以要使用findall:

from xml.etree import ElementTree as et

tree = et.parse("Your.xml")
cals = tree.findall(".//Calendar") 
print([c.get("ID") for c in cals if len(c.findall("./ContactItem/FIELDS")) > 2])