Question

中的草稿订单来填表

我遇到的问题是，所提取的唯一数据来自具有不同背景颜色的行（带有＆＃39; *＆＃39;在该数字旁边的行）。

我的代码如下：

wikiURL = "https://en.wikipedia.org/wiki/2012_NFL_Draft"

#create array to store player info in
teams_players = []

# request and parse wikiURL
r = requests.get(wikiURL)
soup = BeautifulSoup(r.content, "html.parser")

#find table in wikipedia
playerData = soup.find('table', {"class": "wikitable sortable"})

for row in playerData.find_all('tr'):
    cols = row.find_all('td')

    if len(cols) == 9: 

        teams_players.append((cols[3].text.strip(), cols[4].text.strip()))

for team, player in teams_players:
    print('{:35} {}'.format(team, player))

Answer 1

这是因为if len(cols) == 9:条件。你需要：

跳过第一个标题行
在每个td

th

tr

跳过计数小于6的行

修正版：

for row in playerData.find_all('tr')[1:]:
    cols = row.find_all(['td', 'th'])
    if len(cols) < 6:
        continue
    teams_players.append((cols[5].text.strip(), cols[6].text.strip()))

打印：

QB                                  Stanford
QB                                  Baylor
...
RB                                  Abilene Christian
QB                                  NIU

只能用Python和BS4刮掉表的一部分

1 个答案: