选择两个节点之间的兄弟节点

时间:2012-03-28 09:32:50

标签: python html xpath

我必须收集所有类别名称及其下的所有div,让类以'config-entry'开头。

<h2>category 1</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 2</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 3</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<h2>category 4</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>

我正在使用xpath //h2[1]/following-sibling::h2[1]/preceding-sibling::div[starts-with(@class,'config-entry')],如:

categories = root.xpath("//h2")
for i in xrange(len(categories)):
   print "----%s----" % categories[i].text
   contents = root.xpath("//h2[1]/following-sibling::h2[1]/preceding-sibling::div[starts-with(@class,'config-entry')]")
   print len(contents)

此代码仅适用于类别1.选择类别1和2之间的所有div,但稍后会搞砸。我玩h2[1],将其改为0,2,3,但没有具体的。任何线索?

1 个答案:

答案 0 :(得分:2)

我建议使用h2标记和div标记的并集,它们将按文档顺序返回它们,然后当您处理它们时,每个div“属于”最后一个{ {1}}你看到了。

E.g。

h2

工作示例:

'//h2|//div[contains(@class,"config-entry")]'

产量:

from lxml import etree

doc = etree.HTML("""
<html>
<h2>category 1</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 2</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 3</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<h2>category 4</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
</html>""")

category = None
for ele in doc.xpath('//h2|//div[contains(@class,"config-entry")]'):
  if ele.tag == 'h2':
    category = str(ele.text)
  else:
    if category:
      print "%s: %s, %r" % (category,ele.tag,ele.attrib)