我已经开始玩熊猫和网页抓取了,代码似乎可以正常运行,当我运行代码时,所有结果行都显示在终端中,但是当我将其导出到csv时,它仅显示结果行的一半。我要遍历url可能有一些事情要做,但是我不确定为什么为什么结果仍会在终端中正确显示。
import pandas as pd
import requests
import bs4
from bs4 import BeautifulSoup
urls = ['https://www.indeed.co.uk/jobs?q=Scrum+master&l=London', 'https://www.indeed.co.uk/jobs?q=Scrum+master&l=London&start=10']
for url in urls:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
job_results = soup.find(id='resultsCol')
jobs = job_results.find_all(class_='jobsearch-SerpJobCard')
titles = [job.find(class_='jobtitle').get_text() for job in jobs]
descriptions = [job.find('div', attrs={'class': 'summary'}).get_text() for job in jobs]
jobs_filtered = pd.DataFrame(
{
'title' : titles,
'description' : descriptions,
})
print(jobs_filtered)
jobs_filtered.to_csv('jobs_filtered11.csv')
答案 0 :(得分:2)
请使用附加模式以获取所需的输出。
jobs_filtered.to_csv('jobs_filtered11.csv', mode='a', header=False) # True for the first time if necessary