我正试图用this wikipedia page
中的草稿订单来填表我遇到的问题是,所提取的唯一数据来自具有不同背景颜色的行(带有' *'在该数字旁边的行)。
我的代码如下:
wikiURL = "https://en.wikipedia.org/wiki/2012_NFL_Draft"
#create array to store player info in
teams_players = []
# request and parse wikiURL
r = requests.get(wikiURL)
soup = BeautifulSoup(r.content, "html.parser")
#find table in wikipedia
playerData = soup.find('table', {"class": "wikitable sortable"})
for row in playerData.find_all('tr'):
cols = row.find_all('td')
if len(cols) == 9:
teams_players.append((cols[3].text.strip(), cols[4].text.strip()))
for team, player in teams_players:
print('{:35} {}'.format(team, player))
答案 0 :(得分:1)
这是因为if len(cols) == 9:
条件。你需要:
td
th
和tr
个元素
修正版:
for row in playerData.find_all('tr')[1:]:
cols = row.find_all(['td', 'th'])
if len(cols) < 6:
continue
teams_players.append((cols[5].text.strip(), cols[6].text.strip()))
打印:
QB Stanford
QB Baylor
...
RB Abilene Christian
QB NIU