我有一个非常大的xml文件,如果某个标签超过2,我需要知道ID值。 xml文件是这样的:
capabilities = DesiredCapabilities.firefox();
capabilities.setCapability("applicationName", "qa-user");
webDriver = new RemoteWebDriver(new URL("http://10.0.50.34:4444/wd/hub"), capabilities);
如果联系人中的FIELDS超过2,我将在每个Calendar标签的ID内打印文本,所以我写了这段代码:
<Users>
<Calendar ID="text1">
<Folders>...</Folders>
<FolderRights/>
<Event/>
<EventReminder/>
<EventContact/>
<EventRecurrence/>
<EventException/>
<ContactItem>
<COLUMNS>...</COLUMNS>
<FIELDS>...</FIELDS>
<FIELDS>...</FIELDS>
<FIELDS>...</FIELDS>
<FIELDS>...</FIELDS>
</ContactItem>
<ContactLocation>...</ContactLocation>
<Tags/>
<TagLinks/>
<ItemAttr/>
<ItemAttrData/>
</Calendar>
<Calendar ID="text2">
<Folders>...</Folders>
<FolderRights/>
<Event/>
<EventReminder/>
<EventContact/>
<EventRecurrence/>
<EventException/>
<ContactItem/>
<ContactLocation/>
<Tags/>
<TagLinks/>
<ItemAttr/>
<ItemAttrData/>
</Calendar>
</Users>
但我没有ID值。 我怎样才能做到这一点?非常感谢
答案 0 :(得分:1)
假设您获得了正确的标记元素,这是访问ID属性的方法:
for contatti in dom.getElementsByTagName('Users'):
calendars = contatti.getElementsByTagName('Calendar')
for calendar in calendars:
attribute = calendar.attributes.get("ID")
print attribute.name
print attribute.value
答案 1 :(得分:1)
使用lxml非常简单,找到具有&gt;的日历父标签2 contactitem //使用count标记字段:
from lxml.html import fromstring
tree = fromstring(the_xml)
print(tree.xpath("//calendar[count(./contactitem//fields) > 2]/@id"))
示例运行:
In [8]: from lxml.html import fromstring
In [9]: tree = fromstring(h)
In [10]: tree.xpath("//calendar[count(./contactitem//fields) > 2]/@id"
....: )
Out[10]: ['text1']
或使用lxml.etree:
from lxml.etree import fromstring
tree = fromstring(h)
print(tree.xpath("//Calendar[count(./ContactItem//FIELDS) > 2]/@ID"))
要从文件中读取,请使用 parse :
from lxml.html import parse
tree = parse("your.xml")
您通常应该从文件中读取并让lxml处理编码。
xml.etree不支持count ,所以要使用findall:
from xml.etree import ElementTree as et
tree = et.parse("Your.xml")
cals = tree.findall(".//Calendar")
print([c.get("ID") for c in cals if len(c.findall("./ContactItem/FIELDS")) > 2])