我有这样的结构。
<li class="Title">This</li>
<li><a href="">AAA</a></li>
<li><a href="">BBB</a></li>
<li><a href="">CCC</a></li>
<li class="Title">That</li>
<li><a href="">DDD</a></li>
<li><a href="">EEE</a></li>
这是我的xpath:
sites = sel.xpath("//li[@class='Title']")
for i,site in enumerate(sites):
print i
state = site.xpath("./text()")
city = site.xpath("./following-sibling::li/a/text()")
结果是
0
This
AAA
1
That
DDD
但是我想要选择所有兄弟姐妹而不仅仅是一个
如何选择li
<li class="Title">
喜欢:
This
AAA
This
BBB
This
CCC
That
DDD
That
EEE
答案 0 :(得分:1)
试试这个:
import lxml.etree as etree
string = '''
<root>
<li class="Title">This</li>
<li><a href="">AAA</a></li>
<li><a href="">BBB</a></li>
<li><a href="">CCC</a></li>
<li class="Title">That</li>
<li><a href="">DDD</a></li>
<li><a href="">EEE</a></li>
</root>
'''
st = ", "
tree = etree.fromstring(string)
for i, node in enumerate(tree.xpath('//li[@class="Title"] | //li/a')):
seq = (str(i), node.text, node.attrib.keys()[0])
print st.join(seq)
0, This, class
1, AAA, href
2, BBB, href
3, CCC, href
4, That, class
5, DDD, href
6, EEE, href
现在,您已经足够启动 li 的类型来分支您想要的内容,但请注意,尽管您的原始内容有意义,但没有 li 子元素POST中的缩进。
答案 1 :(得分:1)
作为替代选择(仅检查 元素之后的兄弟姐妹,如果达到另一个 元素,您可以遍历兄弟姐妹并突破。 像这样:
import lxml
# I wrap your sample with an empty div
s = '''<div><li class="Title">This</li>
<li><a href="">AAA</a></li>
<li><a href="">BBB</a></li>
<li><a href="">CCC</a></li>
<li class="Title">That</li>
<li><a href="">DDD</a></li>
<li><a href="">EEE</a></li></div>'''
tree = lxml.etree.fromstring(s)
# search for all <li> with "Title" element
for node in tree.xpath('.//li[@class="Title"]'):
print '\n'
# loop in <li class="Title"> to find for any siblings with <a> element
for sub in node.xpath('.//following-sibling::li'):
# break out the loop if another <li class="Title"> is found
# you can implement other logic to break out as well
if sub.get('class') == 'Title':
break
print node.text
print ''.join(sub.xpath('./a/text()'))
结果:
This
AAA
This
BBB
This
CCC
That
DDD
That
EEE