我正在尝试使用python beautifulsoup从确实的br标记中提取br标签中的公司名称。
HTML代码:
<p>
<h2 class="jobTitle">
<a href="viewjob?jk=1544ab41b4dc02b6" rel="nofollow">
Data Scientist
</a>
</h2>
<br/>
Deloitte -
<span class="location">
Los Angeles, CA 90013
</span>
<br/>
<span class="date">
1 day ago
</span>
</p>
我尝试了以下代码,但没有任何结果。
companies=soup.find_all('br')
for company in companies:
print(company.text)
答案 0 :(得分:0)
公司名称不是br
标记的一部分,而是br
标记之后的原始文本。
示例:
prev = None
for child in soup.find('p').children:
if prev is not None and prev.name == 'br':
print(child) # company name
break
prev = child
答案 1 :(得分:0)
您可以链接next_sibling
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://www.indeed.com/m/jobs?q=data+scientist&l=Los+Angeles%2C+CA')
soup = bs(r.content, 'lxml')
for job in soup.select('.jobTitle'):
print(job.next_sibling.next_sibling)