下面的代码为当前坐在美国的参议员抓取维基百科页面,该参与者包含在表格中。目前,该代码用于向我提供来自阿拉巴马州的第一个参议员的姓名,派对等 - 我如何重新编写以遍历整个表格?
from bs4 import BeautifulSoup
from urllib.request import urlopen
senatorwiki = 'https://en.wikipedia.org/wiki/List_of_current_United_States_Senators'
html = urlopen(senatorwiki)
soup = BeautifulSoup(html.read(), "lxml")
senatortable = soup.find('table',{'class':"sortable"})
td = senatortable.find('td')
state = td.find_next()
ns = state.find_next_sibling()
picture = ns.find_next_sibling()
name = picture.find_next_sibling()
party = name.find_next_sibling()
privsec = party.find_next_sibling()
print(state.text,ns.text,name.text,party.text,privsec.text)
答案 0 :(得分:1)
迭代表findAll tr,然后遍历那里的所有td。请注意我正在使用请求,不仅因为它很棒,而且urllib在python2.7中也没有请求。
from bs4 import BeautifulSoup
import requests
senatorwiki = 'https://en.wikipedia.org/wiki/List_of_current_United_States_Senators'
html = requests.get(senatorwiki)
soup = BeautifulSoup(html.text, "lxml")
senatortable = soup.find('table',{'class':"sortable"})
rows = senatortable.findAll('tr')
for tr in rows:
print tr.findAll('td')
# to get next lines data of the list of tds is up to you ;)
# print(state.text,ns.text,name.text,party.text,privsec.text)