BeautifulSoup:提取' img alt'文本

时间:2018-05-19 07:04:40

标签: python beautifulsoup

我正在尝试解析Meteor.user()文字。

以下是HTML代码:

img alt

我想做的是获取[<p class="number"> <img alt="1" src="/img/common_new/ball_1.png"/> <img alt="10" src="/img/common_new/ball_10.png"/> <img alt="13" src="/img/common_new/ball_13.png"/> <img alt="26" src="/img/common_new/ball_26.png"/> <img alt="32" src="/img/common_new/ball_32.png"/> <img alt="36" src="/img/common_new/ball_36.png"/> <span class="plus">+</span> <span class="number_bonus"><img alt="9" src="/img/common_new/ball_9.png"/> </span> </p>] img alt 我使用beautifulsoup做什么?

2 个答案:

答案 0 :(得分:1)

您需要先安装bs4并请求。打开cmd并写:

pip install bs4
pip install requests

然后这是你的代码。

from bs4 import BeautifulSoup
import requests
r = requests.get('your website')
source = r.content
soup = BeautifulSoup(r.content, 'lxml') 

altlinks = []
imgalt_list = [1, 10, 13, 32, 36]

for x in soup.find_all('img', alt= True): #we find all img alt names
    if x['alt'] in imgalt_list: #if alt name matchs with your numbers
        altlinks.append(x.get('src')) #adding into list
print(altlinks)

您可以询问任何您不理解的部分。

答案 1 :(得分:0)

使用BeautifulSoup&#39; find_all方法。

>>> import bs4
>>> html = '''<p class="number">
<img alt="1" src="/img/common_new/ball_1.png"/>
<img alt="10" src="/img/common_new/ball_10.png"/>
<img alt="13" src="/img/common_new/ball_13.png"/>
<img alt="26" src="/img/common_new/ball_26.png"/>
<img alt="32" src="/img/common_new/ball_32.png"/>
<img alt="36" src="/img/common_new/ball_36.png"/>
<span class="plus">+</span>
<span class="number_bonus"><img alt="9" src="/img/common_new/ball_9.png"/> 
</span>
</p>'''

>>> soup = bs4.BeautifulSoup(html, 'lxml')
>>> img_alt = []
>>> for img_tag in soup.find_all('img'):
...     img_alt.append(int(img_tag.get('alt')))  # typecasting to integer
>>> print(img_alt)
[1,10, 13,26,32,36,9]  # Output