我正在查看一个父URL,就是这个。
https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_Senate
从那里,我想让Python单击几个链接,所有链接都是('td')[3] .a ['href']。父URL的前三个是:'Richard Shelby', 'Doug Jones', and 'Lisa Murkowski'
。所有子链接的文本均与此匹配:'Assumed office'
。我想抓住'Assumed office'
的所有这些日期。因此,对于'Richard Shelby'
,它将是:
Assumed office
January 3, 1987
Assumed office
April 10, 2018
我该怎么做?
要导航到几个不同的链接,我认为它将看起来像这样……
from urllib.parse import urljoin
senator_link = "https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_Senate"
senator_link = row.find_all('td')[3].a['href']
senator_link = urljoin(link, senator_link)
response = session.get(senator_link)
with requests.Session() as session:
html = session.get(link).text
soup = BeautifulSoup(response.content, "lxml")
res = soup.findAll("span", {"class": "nowrap"})
for r in res:
print("Assumed Office: " + r.find("span", {'class': 'nowrap'}).text)
我得到的那段代码是这样的:
AttributeError: 'NoneType' object has no attribute 'text'
答案 0 :(得分:1)
您可以通过id查找表,然后循环遍历行,找到'Assumed office'
的名称和日期:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_Senate').text, 'html.parser')
_, *data = [list(filter(lambda x:x != '\n', [c.text for c in i.find_all('td')])) for i in d.find('table', {'id':'senators'}).find_all('tr')]
final_names = [[(i[1] if len(i) == 7 else i[0]).rstrip(), i[-2].rstrip()] for i in data]
输出:
[['Richard Shelby', 'January 3, 1987'], ['Doug Jones[d]', 'January 3, 2018'], ['Lisa Murkowski', 'December 20, 2002'], ['Dan Sullivan', 'January 3, 2015'], ['John McCain', 'January 3, 1987'], ['Jeff Flake', 'January 3, 2013'], ['John Boozman', 'January 3, 2011'], ['Tom Cotton', 'January 3, 2015'], ['Dianne Feinstein', 'November 10, 1992'], ['Kamala Harris', 'January 3, 2017'], ['Michael Bennet', 'January 22, 2009'], ['Cory Gardner', 'January 3, 2015'], ['Richard Blumenthal', 'January 3, 2011'], ['Chris Murphy', 'January 3, 2013'], ['Tom Carper', 'January 3, 2001'], ['Chris Coons', 'November 15, 2010'], ['Bill Nelson', 'January 3, 2001'], ['Marco Rubio', 'January 3, 2011'], ['Johnny Isakson', 'January 3, 2005'], ['David Perdue', 'January 3, 2015'], ['Brian Schatz', 'December 26, 2012'], ['Mazie Hirono', 'January 3, 2013'], ['Mike Crapo', 'January 3, 1999'], ['Jim Risch', 'January 3, 2009'], ['Dick Durbin', 'January 3, 1997'], ['Tammy Duckworth', 'January 3, 2017'], ['Joe Donnelly', 'January 3, 2013'], ['Todd Young', 'January 3, 2017'], ['Chuck Grassley', 'January 3, 1981'], ['Joni Ernst', 'January 3, 2015'], ['Pat Roberts', 'January 3, 1997'], ['Jerry Moran', 'January 3, 2011'], ['Mitch McConnell', 'January 3, 1985'], ['Rand Paul', 'January 3, 2011'], ['Bill Cassidy', 'January 3, 2015'], ['John Kennedy', 'January 3, 2017'], ['Susan Collins', 'January 3, 1997'], ['Angus King', 'January 3, 2013'], ['Ben Cardin', 'January 3, 2007'], ['Chris Van Hollen', 'January 3, 2017'], ['Elizabeth Warren', 'January 3, 2013'], ['Ed Markey', 'July 16, 2013'], ['Debbie Stabenow', 'January 3, 2001'], ['Gary Peters', 'January 3, 2015'], ['Amy Klobuchar', 'January 3, 2007'], ['Tina Smith[e]', 'January 3, 2018'], ['Roger Wicker', 'December 31, 2007'], ['Cindy Hyde-Smith[f]', 'April 9, 2018'], ['Claire McCaskill', 'January 3, 2007'], ['Roy Blunt', 'January 3, 2011'], ['Jon Tester', 'January 3, 2007'], ['Steve Daines', 'January 3, 2015'], ['Deb Fischer', 'January 3, 2013'], ['Ben Sasse', 'January 3, 2015'], ['Dean Heller', 'May 9, 2011'], ['Catherine Cortez Masto', 'January 3, 2017'], ['Jeanne Shaheen', 'January 3, 2009'], ['Maggie Hassan', 'January 3, 2017'], ['Bob Menendez', 'January 18, 2006'], ['Cory Booker', 'October 31, 2013'], ['Tom Udall', 'January 3, 2009'], ['Martin Heinrich', 'January 3, 2013'], ['Chuck Schumer', 'January 3, 1999'], ['Kirsten Gillibrand', 'January 26, 2009'], ['Richard Burr', 'January 3, 2005'], ['Thom Tillis', 'January 3, 2015'], ['John Hoeven', 'January 3, 2011'], ['Heidi Heitkamp', 'January 3, 2013'], ['Sherrod Brown', 'January 3, 2007'], ['Rob Portman', 'January 3, 2011'], ['Jim Inhofe', 'November 17, 1994'], ['James Lankford', 'January 3, 2015'], ['Ron Wyden', 'February 6, 1996'], ['Jeff Merkley', 'January 3, 2009'], ['Bob Casey Jr.', 'January 3, 2007'], ['Pat Toomey', 'January 3, 2011'], ['Jack Reed', 'January 3, 1997'], ['Sheldon Whitehouse', 'January 3, 2007'], ['Lindsey Graham', 'January 3, 2003'], ['Tim Scott', 'January 2, 2013'], ['John Thune', 'January 3, 2005'], ['Mike Rounds', 'January 3, 2015'], ['Lamar Alexander', 'January 3, 2003'], ['Bob Corker', 'January 3, 2007'], ['John Cornyn', 'December 1, 2002'], ['Ted Cruz', 'January 3, 2013'], ['Orrin Hatch', 'January 3, 1977'], ['Mike Lee', 'January 3, 2011'], ['Patrick Leahy', 'January 3, 1975'], ['Bernie Sanders', 'January 3, 2007'], ['Mark Warner', 'January 3, 2009'], ['Tim Kaine', 'January 3, 2013'], ['Patty Murray', 'January 3, 1993'], ['Maria Cantwell', 'January 3, 2001'], ['Joe Manchin', 'November 15, 2010'], ['Shelley Moore Capito', 'January 3, 2015'], ['Ron Johnson', 'January 3, 2011'], ['Tammy Baldwin', 'January 3, 2013'], ['Mike Enzi', 'January 3, 1997'], ['John Barrasso', 'June 25, 2007']]