我正在尝试使用Python和bs4将开发人员的工作从really.nl抓到Excel。一切正常,但是当我在Excel中打开它时,作业之间会有额外的行单元格
谁能看到我做错了什么?
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.indeed.nl/jobs?q=developer&l='
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
#grabs each job
containers = page_soup.findAll("div",{"class":"row"})
filename = "indeedjobs.csv"
f = open(filename, "w")
headers = "Company; Job; City\n"
f.write(headers)
for container in containers:
jobtitle = container.a["title"]
city_container = container.findAll("span",{"class":"location"})
City_name = city_container[0].text
company_container = container.findAll("span",{"class":"company"})
company_name = company_container[0].text
print("Company: " + company_name)
print("Job: " + jobtitle)
print("City: " + City_name)
f.write(company_name + ";" + jobtitle + ";" + City_name + "\n")
f.close()
答案 0 :(得分:2)
<span class="company">
元素以换行符和一些空格开头。删除.strip()。
您还可以考虑csv module来编写格式正确的CSV文件。该模块将帮助您正确转义特殊字符。