我正在尝试抓捕this网站,该网站包含有关即将举行的选举的候选人的信息。
我试图获取候选语句和个人资料图片,它们都包含在“ votewa-candidate-page”标签中,但是每当我尝试抓取数据时,我只会得到空值。
这是我的一些代码:
import requests
from bs4 import BeautifulSoup
url = 'https://voter.votewa.gov/GenericVoterGuide.aspx?e=865&c=17#/candidates/57369/45923'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
statement = soup.find('votewa-candidate-page')
谢谢您的帮助。
答案 0 :(得分:0)
分析网站时,它会通过ajax调用加载数据。
以下脚本将由您打印所需的信息
import requests
res = requests.get("https://voter.votewa.gov/elections/candidate.ashx?e=865&r=57369&b=45923&la=&c=17")
data = res.json()
photo = data[0]['statement']['Photo']
statement = data[0]['statement']['Statement']
print(statement)
我只打印声明,因为照片是base64编码的图像。
输出:
"<p><strong>Elected Experience\n</strong><br />\nUnited States Representative, 2012-Present. Ways and Means Committee and Select Committee to Modernize Congress.\n</p>\n<p><strong>Other Professional Experience</strong><br />\nSuccessful career as a businesswoman and entrepreneur. Former Microsoft executive, led local high-tech startups. Former Director of Washington’s Department of Revenue, where I led efforts to simplify the tax system and help small businesses.\n</p>\n<p><strong>Education\n</strong><br />\nB.A., Biology, Reed College; M.B.A., University of Washington.\n</p>\n<p><strong>Community Service\n</strong><br />\nI’ve mentored students at UW Business School; been active in my church, serving as a board member. Volunteered with the PTA, Girl Scouts and YWCA, supporting transitional housing, job training and services to help families get back on their feet.\n</p>\n<p><strong>Statement\n</strong><br />\nDuring this pandemic, families across the 1st Congressional District are struggling and concerned about the future. Now more than ever, we need strong leadership that’s focused on helping those in need, protecting health and safety and restoring our economy. As your Congresswoman, I am determined that we come through this difficult stretch stronger than ever. My focus is on putting partisanship aside and delivering results.\n</p>\n<p>The first known COVID-19 case struck in Washington State before anywhere else. President Trump denied the threat and wasted precious time. By contrast, I moved quickly, securing funds to backfill state and local public health accounts. Washington State immediately received over $11 million, with continued ongoing support. I also advocated for employee retention tax credits to keep an estimated 60 million people employed with benefits. They were incorporated into the pandemic relief Heroes Act.\n</p>\n<p>As the economy reopens we face an uncertain recovery. We need smart, decisive action to restore our economy. My background as a successful businesswoman and entrepreneur means I understand how to bring businesses back and create jobs.\n</p>\n<p>My proposal to expand child tax credits, which could reduce child poverty 38 percent, has been incorporated into pandemic relief legislation. I also developed provisions, adopted last year, to make the process of applying for financial aid easier for students. I’ve pushed to expand farmers’ access to markets, improve access to broadband, and to increase the supply of affordable housing.\n</p>\n<p>My core values remain unchanged. As I have done from the day I took office, I'll protect Social Security, Medicare and a woman’s right to choose. I have endorsements from Democratic groups, labor, local leaders and many others.\n</p>\n<p>The fallout from this pandemic is challenging, but I'm committed to putting people back to work and preserving the middle-class. I ask for your support.</p>"
答案 1 :(得分:0)
我想您不想先搜索json然后再打印,因此这是一个代码,该代码采用您示例中使用的url并自动获取json并打印该语句。
import requests
from bs4 import BeautifulSoup
url = 'https://voter.votewa.gov/GenericVoterGuide.aspx?e=865&c=17#/candidates/57369/45923'
converted = f'https://voter.votewa.gov/elections/candidate.ashx?e=' \
f'{url.split("?e=")[1].split("&")[0]}&r={url.split("/")[-2]}&b={url.split("/")[-1]}&la=&c={url.split("&c=")[1].split("#")[0]}'
page = requests.get(converted)
data = page.json()
statement = data[0]['statement']['Statement']
soup = BeautifulSoup(statement, 'html.parser')
print(*[p.text for p in soup.select('p')])
打印:
Elected Experience
United States Representative, 2012-Present. Ways and Means Committee and Select Committee to Modernize Congress.
Other Professional Experience
Successful career as a businesswoman and entrepreneur. Former Microsoft executive, led local high-tech startups. Former Director of Washington’s Department of Revenue, where I led efforts to simplify the tax system and help small businesses.
Education
B.A., Biology, Reed College; M.B.A., University of Washington.
Community Service
以此类推...