以网络方式为获胜者打造网站

时间:2018-07-21 19:40:44

标签: python web-scraping

嗨,我正在尝试使用Python 3抓取该网站,并注意到在源代码中并没有明确表明我将如何抓取这些初选的获胜者姓名。您能告诉我如何在此网站上刮取每届MD初选的所有获奖者名单吗?

https://elections2018.news.baltimoresun.com/results/

1 个答案:

答案 0 :(得分:0)

解析有点复杂,因为结果在许多子页面中。此脚本收集它们并打印结果(所有数据都存储在变量data中):

from bs4 import BeautifulSoup
import requests

url = "https://elections2018.news.baltimoresun.com/results/"
r = requests.get(url)

data = {}
soup = BeautifulSoup(r.text, 'lxml')
for race in soup.select('div[id^=race]'):
    r = requests.get(f"https://elections2018.news.baltimoresun.com/results/contests/{race['id'].split('-')[1]}.html")
    s = BeautifulSoup(r.text, 'lxml')
    l = []
    data[(s.find('h3').text, s.find('div', {'class': 'party-header'}).text)] = l

    for candidate, votes, percent in zip(s.select('td.candidate'), s.select('td.votes'), s.select('td.percent')):
        l.append((candidate.text, votes.text, percent.text))

print('Winners:')
for (race, party), v in data.items():
    print(race, party, v[0])

# print(data)

输出:

Winners:
Governor / Lt. Governor Democrat ('Ben Jealous and Susan Turnbull', '227,764', '39.6%')
U.S. Senator Republican ('Tony Campbell', '50,915', '29.2%')
U.S. Senator Democrat ('Ben Cardin', '468,909', '80.4%')
State's Attorney Democrat ('Marilyn J. Mosby', '39,519', '49.4%')
County Executive Democrat ('John "Johnny O" Olszewski, Jr.', '27,270', '32.9%')
County Executive Republican ('Al Redmer, Jr.', '17,772', '55.7%')