我正在尝试解析Meteor.user()
文字。
以下是HTML代码:
img alt
我想做的是获取[<p class="number">
<img alt="1" src="/img/common_new/ball_1.png"/>
<img alt="10" src="/img/common_new/ball_10.png"/>
<img alt="13" src="/img/common_new/ball_13.png"/>
<img alt="26" src="/img/common_new/ball_26.png"/>
<img alt="32" src="/img/common_new/ball_32.png"/>
<img alt="36" src="/img/common_new/ball_36.png"/>
<span class="plus">+</span>
<span class="number_bonus"><img alt="9" src="/img/common_new/ball_9.png"/>
</span>
</p>]
img alt
我使用beautifulsoup做什么?
答案 0 :(得分:1)
您需要先安装bs4并请求。打开cmd并写:
pip install bs4
pip install requests
然后这是你的代码。
from bs4 import BeautifulSoup
import requests
r = requests.get('your website')
source = r.content
soup = BeautifulSoup(r.content, 'lxml')
altlinks = []
imgalt_list = [1, 10, 13, 32, 36]
for x in soup.find_all('img', alt= True): #we find all img alt names
if x['alt'] in imgalt_list: #if alt name matchs with your numbers
altlinks.append(x.get('src')) #adding into list
print(altlinks)
您可以询问任何您不理解的部分。
答案 1 :(得分:0)
使用BeautifulSoup
&#39; find_all
方法。
>>> import bs4
>>> html = '''<p class="number">
<img alt="1" src="/img/common_new/ball_1.png"/>
<img alt="10" src="/img/common_new/ball_10.png"/>
<img alt="13" src="/img/common_new/ball_13.png"/>
<img alt="26" src="/img/common_new/ball_26.png"/>
<img alt="32" src="/img/common_new/ball_32.png"/>
<img alt="36" src="/img/common_new/ball_36.png"/>
<span class="plus">+</span>
<span class="number_bonus"><img alt="9" src="/img/common_new/ball_9.png"/>
</span>
</p>'''
>>> soup = bs4.BeautifulSoup(html, 'lxml')
>>> img_alt = []
>>> for img_tag in soup.find_all('img'):
... img_alt.append(int(img_tag.get('alt'))) # typecasting to integer
>>> print(img_alt)
[1,10, 13,26,32,36,9] # Output