Question

我使用find.all()

提取了一些数据

这给了我一个包含许多字符串的列表，例如下面的内容。

<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>

我需要的是来自<a class ="y">

的文字

我该怎么做？或许使用循环？

Answer 1

这是如何使用美丽的汤来做到这一点：

>>> html= '''\
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>
<div class="x"><a class="x" href="x"><i class="x"></i></a> <a class="y" href="x">to make</a><span> something</span></div>'''
>>> soup = BeautifulSoup(html)    
>>> list_of_y = soup.findAll("a", {'class': 'y'})

返回您可以打印的项目列表：

>>> print(list_of_y)
[<a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>, <a class="y" href="x">to make</a>]

或迭代：

>>> for y in list_of_y:
...   print(y.text)
to make
to make
to make

但是，我对lxml有一点偏好，那就是：

>>> h = etree.HTML(html)
>>> list_of_y = h.xpath('//a[@class="y"]/text()')
>>> print list_of_y
['to make', 'to make', 'to make']
>>> for y in list_of_y:
...   print(y)
... 
to make
to make
to make

或使用CSS选择器：

>>> from lxml import etree, cssselector
>>> h = etree.HTML(html)
>>> sel = cssselector.CSSSelector('a.y')
>>> list_of_y = sel(h)
>>> for y in list_of_y:
>>>     print(y.text)

如何使用BeautifulSoup从列表中提取部分项目？

1 个答案: