HTML:
<div class="job-result-logo-title">
<div class="job-result-logo">
<a href="/Recruiters/SQS-Ireland-5673.aspx"><img alt="SQS Ireland" src="/Logos/SQS-Ireland-small-5673.gif"></a>
</div>
<div class="job-result-title">
<h2 itemprop="title"><a href="/Jobs/QA-Analyst-8148774.aspx">QA Analyst</a>
</h2>
<h3 itemprop="name">
<a itemprop="hiringOrganization" itemscope="" itemtype="https://schema.org/Organization" href="/Recruiters/SQS-Ireland-5673.aspx">SQS Ireland</a>
</h3>
</div>
</div>
<div class="job-result-overview" style="display: ">
<ul class="job-overview">
<li itemprop="baseSalary" class="salary">Negotiable</li>
<li itemprop="datePosted" class="updated-time">Updated 17/03/2018</li>
<li itemprop="jobLocation" class="location">
<a href="/Jobs/Dublin-City-Centre/">Dublin City Centre</a>
<span> /</span> <a href="/Jobs/Dublin-South/">Dublin South</a>
<span> /</span> <a href="/Jobs/Dublin-North/">Dublin North</a>
</li>
</ul>
</div>
我的代码:
def find_data(source):
for a in source.find_all('div', class_='job-result-title'):
job_info = a.find('h2').find('a')
company_name = a.find('h3').find('a').get_text()
url = job_info['href']
full_url = base_url + url
role = job_info.get_text()
for ul in source.find_all('ul', class_='job-overview'):
date = ul.find('li',class_='updated-time').get_text().replace('Updated','').strip()
append_data("data.csv", company_name, role, full_url, date)
我已经尝试了太多这个代码的替代品,并尝试在这里寻找类似的答案,但没有运气,我总是从这行代码中得到相同的日期,我不知道为什么它不迭代所有相同的标签包含每个标签的日期:
<li itemprop="datePosted" class="updated-time">Updated 17/03/2018</li>
答案 0 :(得分:0)
您没有保存在for
循环中找到的值。这就是为什么当您写入CSV文件时,您将获得所有变量的最后一个值。
您需要保存列表中的所有值,然后将其写入CSV。
代码:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.irishjobs.ie/ShowResults.aspx?Keywords=test&Location=102&Category=3&Recruiter=All&SortBy=MostRecent&PerPage=100')
source = BeautifulSoup(r.text, 'lxml')
company_name, role, full_url, date = [], [], [], []
base_url = 'https://www.irishjobs.ie'
for a in source.find_all('div', class_='job-result-title'):
job_info = a.find('h2').find('a')
company_name.append(a.find('h3').find('a').get_text())
url = job_info['href']
full_url.append(base_url + url)
role.append(job_info.get_text())
for ul in source.find_all('ul', class_='job-overview'):
date.append(ul.find('li',class_='updated-time').get_text().replace('Updated','').strip())
for a, b, c, d in zip(company_name, role, full_url, date):
print(a, b, c, d)
部分输出:
Globoforce Senior QA Automation Engineer https://www.irishjobs.ie/Jobs/Senior-QA-Automation-Engineer-8149253.aspx 17/03/2018
Globoforce Technical Team Lead (Java) https://www.irishjobs.ie/Jobs/Technical-Team-Lead-Java-8149252.aspx 17/03/2018
Globoforce Performance Test Engineer https://www.irishjobs.ie/Jobs/Performance-Test-Engineer-8149251.aspx 17/03/2018
Globoforce Senior Front End Developer https://www.irishjobs.ie/Jobs/Senior-Front-End-Developer-8149249.aspx 17/03/2018
Synchronoss Technologies Lead iOS Swift Developer Enterprise Agile https://www.irishjobs.ie/Jobs/Lead-iOS-Swift-Developer-Enterprise-8149248.aspx 17/03/2018
Computer Futures .NET Engineer Front End Developer https://www.irishjobs.ie/Jobs/NET-Engineer-Front-End-Developer-8149244.aspx 17/03/2018
Computer Futures .NET Developer C# ASP.NET Core https://www.irishjobs.ie/Jobs/NET-Developer-CSharp-ASP-NET-8149241.aspx 17/03/2018
Computer Futures Senior C# Developer TDD DDD https://www.irishjobs.ie/Jobs/Senior-CSharp-Developer-TDD-DDD-8149240.aspx 17/03/2018
您只需要在CSV中写入值而不是print(a,b,c,d)
。