我使用find.all()
这给了我一个包含许多字符串的列表,例如下面的内容。
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
我需要的是来自<a class ="y">
我该怎么做?或许使用循环?
答案 0 :(得分:2)
这是如何使用美丽的汤来做到这一点:
>>> html= '''\
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>'''
>>> soup = BeautifulSoup(html)
>>> list_of_y = soup.findAll("a", {'class': 'y'})
返回您可以打印的项目列表:
>>> print(list_of_y)
[<a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>]
或迭代:
>>> for y in list_of_y:
... print(y.text)
to make
to make
to make
>>> h = etree.HTML(html)
>>> list_of_y = h.xpath('//a[@class="y"]/text()')
>>> print list_of_y
['to make', 'to make', 'to make']
>>> for y in list_of_y:
... print(y)
...
to make
to make
to make
或使用CSS选择器:
>>> from lxml import etree, cssselector
>>> h = etree.HTML(html)
>>> sel = cssselector.CSSSelector('a.y')
>>> list_of_y = sel(h)
>>> for y in list_of_y:
>>> print(y.text)