跟随兄弟可以获得许多属性吗?

时间:2015-01-05 03:13:20

标签: python xpath

我有这样的结构。

 <li class="Title">This</li>
 <li><a href="">AAA</a></li>
 <li><a href="">BBB</a></li>
 <li><a href="">CCC</a></li>
 <li class="Title">That</li>
 <li><a href="">DDD</a></li>
 <li><a href="">EEE</a></li>

这是我的xpath:

 sites = sel.xpath("//li[@class='Title']")
 for i,site in enumerate(sites):
      print i
      state = site.xpath("./text()")
      city = site.xpath("./following-sibling::li/a/text()")

结果是

 0
 This 
 AAA
 1
 That 
 DDD

但是我想要选择所有兄弟姐妹而不仅仅是一个

如何选择li

下的所有同级<li class="Title">

喜欢:

This 
AAA
This 
BBB
This 
CCC

That 
DDD
That 
EEE

2 个答案:

答案 0 :(得分:1)

试试这个:

import lxml.etree as etree

string = '''
<root>
  <li class="Title">This</li>
  <li><a href="">AAA</a></li>
  <li><a href="">BBB</a></li>
  <li><a href="">CCC</a></li>
  <li class="Title">That</li>
  <li><a href="">DDD</a></li>
  <li><a href="">EEE</a></li>
</root>
'''

st = ", "

tree = etree.fromstring(string)

for i, node in enumerate(tree.xpath('//li[@class="Title"] | //li/a')):
    seq = (str(i), node.text, node.attrib.keys()[0])
    print st.join(seq)

输出:

0, This, class
1, AAA, href
2, BBB, href
3, CCC, href
4, That, class
5, DDD, href
6, EEE, href

注意:

现在,您已经足够启动 li 的类型来分支您想要的内容,但请注意,尽管您的原始内容有意义,但没有 li 子元素POST中的缩进。

答案 1 :(得分:1)

作为替代选择(仅检查 元素之后的兄弟姐妹,如果达到另一个 元素,您可以遍历兄弟姐妹并突破。 像这样:

import lxml

# I wrap your sample with an empty div
s = '''<div><li class="Title">This</li>
     <li><a href="">AAA</a></li>
     <li><a href="">BBB</a></li>
     <li><a href="">CCC</a></li>
 <li class="Title">That</li>
     <li><a href="">DDD</a></li>
     <li><a href="">EEE</a></li></div>'''

tree = lxml.etree.fromstring(s)
# search for all <li> with "Title" element
for node in tree.xpath('.//li[@class="Title"]'):
    print '\n'
    # loop in <li class="Title"> to find for any siblings with <a> element
    for sub in node.xpath('.//following-sibling::li'):
        # break out the loop if another <li class="Title"> is found
        # you can implement other logic to break out as well 
        if sub.get('class') == 'Title':
            break
        print node.text
        print ''.join(sub.xpath('./a/text()'))

结果:

This
AAA
This
BBB
This
CCC

That
DDD
That
EEE