我正在使用Python Anaconda将数据抓取到Excel工作表中。我在两个网站上遇到了麻烦。
站点1
<div id="ember3815" class="ember-view">
<p class="org-top-card-module__company-descriptions Sans-15px-black-55%">
<span class="company-industries org-top-card-module__dot-separated-list">
Industry
</span>
<span class="org-top-card-module__location org-top-card-module__dot-separated-list">
City, State
</span>
<span title="62,346 followers" class="org-top-card-module__followers-count org-top-card-module__dot-separated-list">
62,346 followers
</span>
我正在尝试拉跨度标题。我尝试过的事情(我也将它们全部尝试为find_all):
text = soup.find('span',{'class':"company-industries org-top-card-module__dot-separated-list"})
text = soup.find('p',{'class':"org-top-card-module__company-descriptions Sans-15px-black-55%"})
text = soup.body.find('span', attrs={'class': 'org-top-card-module__location org-top-card-module__dot-separated-list'})
text = soup.find('span',{'class': 'org-top-card-module__location org-top-card-module__dot-separated-list'})
我确定还有其他尝试未列出的原因,因为我记不清所有内容。我不是程序员,我只是想弄清楚要提取数据进行分析。救命?
站点2
我需要从下面的html中提取值8,052。
<section class="zwlfE">
<div class="nZSzR">...</div>
<ul class="k9GMp ">
<li class="Y8-fY ">...</li>
<li class-"Y8-fY ">
<a class="g47SY " title="8,052">8,052</span>" followers"
</a>
</li>
<li class="Y8-fY ">...</li>
</ul>
<div class="-vDIg">...</div>
</section>
我尝试过:
我在[]中尝试过的所有结果。
请帮助?
答案 0 :(得分:0)
要获取span title
from bs4 import BeautifulSoup
html ="""<div id="ember3815" class="ember-view">
<p class="org-top-card-module__company-descriptions Sans-15px-black-55%">
<span class="company-industries org-top-card-module__dot-separated-list">
Industry
</span>
<span class="org-top-card-module__location org-top-card-module__dot-separated-list">
City, State
</span>
<span title="62,346 followers" class="org-top-card-module__followers-count org-top-card-module__dot-separated-list">
62,346 followers
</span>"""
soup = BeautifulSoup(html, "html.parser")
print( soup.find("span", class_="org-top-card-module__followers-count org-top-card-module__dot-separated-list")["title"])
输出:
62,346 followers
对于site2
print( soup.find("a", class_="g47SY")["title"])