使用beautifulsoup4从网站提取数据并解析为csv

时间:2020-03-10 20:15:30

标签: csv parsing beautifulsoup

我刚开始使用python,但我有点迷路。基本上,问题是我想从以下网站提取数据:“ https://www.berufsstart.de/unternehmen/bundesland/baden-wuerttemberg-top-100.php”,然后将所有100家公司及其员工人数和所在城市解析为csv。我以前从未使用过beautifulsoup,发现的每个教程都只使用简单的代码。我会分享我的代码,但是并不会越来越多地试图理解这个概念。我不希望有100%完成的解决方案,更多有关如何启动此项目的解释。

先谢谢大家!

1 个答案:

答案 0 :(得分:0)

from bs4 import BeautifulSoup
import requests
import csv


r = requests.get(
    "https://www.berufsstart.de/unternehmen/bundesland/baden-wuerttemberg-top-100.php")

soup = BeautifulSoup(r.text, 'html.parser')

numbers = []
names = []
cities = []
for num in soup.findAll("div", class_="col-sm-2"):
    num = num.get_text(strip=True, separator=",")
    if num:
        numbers.append(num.split(',')[1])
for name in soup.findAll("strong", class_="h2"):
    names.append(name.text)
for city in soup.findAll("div", class_="col-sm-5 infobereich"):
    cities.append(city.get_text(strip=True, separator=" ").split(" ")[1])

with open("kas.csv", 'w', newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Name", "City", "Number"])
    for a, b, c in zip(names, cities, numbers):
        writer.writerow([a, b, c])

print("Done")

输出:view-online