Question

我正在尝试在python中附加一个URL，以从目标URL中抓取详细信息。我有以下代码，但它似乎是从url1而非URL抓取数据。

我已经从NFL网站上删除了球队名称，没有任何问题。问题出在Spotrac URL上，我要附加从NFL网站上抓取的球队名称。

import requests
from bs4 import BeautifulSoup   

URL ='https://www.nfl.com/teams/'

page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')

team_name = []

team_name_list = soup.find_all('h4',class_='d3-o-media-object__roofline nfl-c-custom-promo__headline')
for team in team_name_list:
  if team.find('p'):
      team_name.append(team.text)

for team in team_name: 
        
    team = team.replace(" ", "-").lower()

    url1 = 'https://www.spotrac.com/nfl/rankings/'
    URL = url1 +str(team)
    print(URL)
    data = {
        'ajax': 'true',
        'mobile': 'false'
    }
    
    bs_soup = BeautifulSoup(requests.post(URL, data=data).content, 'html.parser')
    spotrac_df = pd.DataFrame(columns = ['Name', 'Salary']) 
    
    for h3 in bs_soup.select('h3'):
        spotrac_df = spotrac_df.append(pd.DataFrame({'Name': str(h3.text), 'Salary' : str(h3.find_next(class_="rank-value").text)}, index=[0]), ignore_index=False)

我几乎可以肯定问题出在URL附加不正确。抓取是从url1而非URL获取薪水等。

我的控制台输出（使用Spyder IDE）如下所示，用于print（URL）

Answer 1

URL正确附加了，但是您的团队名称中有一个空白空格。我还做了一些其他更改，并在代码中记下了它们。

最后，（并且我曾经做过这两个），创建一个空的数据框，然后在我认为不是最佳方法的每次迭代之后追加到它。有人告诉我，最好使用列表/字典来构造行，然后完成后，再调用pandas来构造数据框，因此也进行了更改。

import requests
from bs4 import BeautifulSoup   
import pandas as pd

url ='https://www.nfl.com/teams/'

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

team_name = []

team_name_list = soup.find_all('h4',class_='d3-o-media-object__roofline nfl-c-custom-promo__headline')
for team in team_name_list:
  if team.find('p'):
      team_name.append(team.text.strip()) #<- remove leading/trailing white space

url1 = 'https://www.spotrac.com/nfl/rankings/' #<- since this is fixed, put it before the loop
spotrac_rows = []
for team in team_name: 
        
    team = '-'.join(team.split()).lower() #<- changed to split in case theres 2 spaces between city and team

    url1 = 'https://www.spotrac.com/nfl/rankings/'
    url = url1 + str(team)
    print(url)
    data = {
        'ajax': 'true',
        'mobile': 'false'
    }
    
    bs_soup = BeautifulSoup(requests.post(url, data=data).content, 'html.parser')
    
    for h3 in bs_soup.select('h3'):
        spotrac_rows.append({'Name': str(h3.text), 'Salary' : str(h3.find_next(class_="rank-value").text.strip())})  #<- remove white space from the salary
        
spotrac_df = pd.DataFrame(spotrac_rows)

输出：

print(spotrac_df)
                       Name       Salary
0            Chandler Jones  $21,333,333
1          Patrick Peterson  $13,184,588
2            D.J. Humphries  $12,800,000
3           DeAndre Hopkins  $12,500,000
4          Larry Fitzgerald  $11,750,000
5              Jordan Hicks  $10,500,000
6               Justin Pugh  $10,500,000
7              Kenyan Drake   $8,483,000
8              Kyler Murray   $8,080,601
9             Robert Alford   $7,500,000
10              J.R. Sweezy   $6,500,000
11             Corey Peters   $4,437,500
12           Haason Reddick   $4,288,444
13          Jordan Phillips   $4,000,000
14           Isaiah Simmons   $3,757,101
15            Maxx Williams   $3,400,000
16            Zane Gonzalez   $3,259,000
17            Devon Kennard   $2,500,000
18              Budda Baker   $2,173,184
19       De'Vondre Campbell   $2,000,000
20                 Andy Lee   $2,000,000
21             Byron Murphy   $1,815,795
22           Christian Kirk   $1,607,691
23             Aaron Brewer   $1,168,750
24               Max Garcia   $1,143,125
25            Andy Isabella   $1,052,244
26               Mason Cole     $977,629
27               Zach Allen     $975,855
28              Chris Banjo     $887,500
29         Jonathan Bullard     $887,500
                    ...          ...
2530       Khari Blasingame     $675,000
2531         Kenneth Durden     $675,000
2532         Cody Hollister     $675,000
2533              Joey Ivie     $675,000
2534            Greg Joseph     $675,000
2535             Kareem Orr     $675,000
2536     David Quessenberry     $675,000
2537        Derick Roberson     $675,000
2538           Shaun Wilson     $675,000
2539          Cole McDonald     $635,421
2540          Chris Jackson     $629,570
2541             Kobe Smith     $614,333
2542           Aaron Brewer     $613,333
2543           Cale Garrett     $613,333
2544           Tommy Hudson     $613,333
2545     Kristian Wilkerson     $613,333
2546  Khaylan Kearse-Thomas     $612,500
2547         Nick Westbrook     $612,333
2548          Kyle Williams     $611,833
2549           Mason Kinsey     $611,666
2550          Tucker McCann     $611,666
2551       Cameron Scarlett     $611,666
2552             Teair Tart     $611,666
2553           Brandon Kemp     $611,333
2554              Wyatt Ray     $610,000
2555             Josh Smith     $610,000
2556         Logan Woodside     $610,000
2557          Rashard Davis     $610,000
2558          Avery Gennesy     $610,000
2559           Parker Hesse     $610,000

[2560 rows x 2 columns]

连接网址和抓取数据时出现问题

1 个答案: