Question

我已经开始玩熊猫和网页抓取了，代码似乎可以正常运行，当我运行代码时，所有结果行都显示在终端中，但是当我将其导出到csv时，它仅显示结果行的一半。我要遍历url可能有一些事情要做，但是我不确定为什么为什么结果仍会在终端中正确显示。

import pandas as pd
import requests
import bs4
from bs4 import BeautifulSoup

urls = ['https://www.indeed.co.uk/jobs?q=Scrum+master&l=London', 'https://www.indeed.co.uk/jobs?q=Scrum+master&l=London&start=10']

for url in urls:
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    job_results = soup.find(id='resultsCol')
    jobs = job_results.find_all(class_='jobsearch-SerpJobCard')

    titles = [job.find(class_='jobtitle').get_text() for job in jobs]
    descriptions = [job.find('div', attrs={'class': 'summary'}).get_text() for job in jobs]

  jobs_filtered = pd.DataFrame(
        {
            'title' : titles,
            'description' : descriptions,
        })

    print(jobs_filtered)
    jobs_filtered.to_csv('jobs_filtered11.csv')

Answer 1

请使用附加模式以获取所需的输出。

jobs_filtered.to_csv('jobs_filtered11.csv', mode='a', header=False) # True for the first time if necessary

熊猫数据框未将所有行导出到csv（但所有行均显示在终端中）

1 个答案: