我刚开始使用python,但我有点迷路。基本上,问题是我想从以下网站提取数据:“ https://www.berufsstart.de/unternehmen/bundesland/baden-wuerttemberg-top-100.php”,然后将所有100家公司及其员工人数和所在城市解析为csv。我以前从未使用过beautifulsoup,发现的每个教程都只使用简单的代码。我会分享我的代码,但是并不会越来越多地试图理解这个概念。我不希望有100%完成的解决方案,更多有关如何启动此项目的解释。
先谢谢大家!
答案 0 :(得分:0)
from bs4 import BeautifulSoup
import requests
import csv
r = requests.get(
"https://www.berufsstart.de/unternehmen/bundesland/baden-wuerttemberg-top-100.php")
soup = BeautifulSoup(r.text, 'html.parser')
numbers = []
names = []
cities = []
for num in soup.findAll("div", class_="col-sm-2"):
num = num.get_text(strip=True, separator=",")
if num:
numbers.append(num.split(',')[1])
for name in soup.findAll("strong", class_="h2"):
names.append(name.text)
for city in soup.findAll("div", class_="col-sm-5 infobereich"):
cities.append(city.get_text(strip=True, separator=" ").split(" ")[1])
with open("kas.csv", 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(["Name", "City", "Number"])
for a, b, c in zip(names, cities, numbers):
writer.writerow([a, b, c])
print("Done")
输出:view-online