无法使用精美的汤料抓取所有数据

时间:2019-11-27 03:34:43

标签: python-3.x beautifulsoup

URL = r"https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/"
My_list = ['2007','2008','2009','2010']

Year = []
CompanyName = []
Rank = []
Score = []

for I, Page in enumerate(My_list, start=1):
    url = r'https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/{}'.format(Page)
    print(url)

    Res = requests.get(url)
    soup = BeautifulSoup(Res.content , 'html.parser')
    data = soup.find('div' ,{'id':'main-content'})
for Data in data:
        Title = data.findAll('h3')
        for title in Title:
            CompanyName.append(title.text.strip())


        Rank = data.findAll('div' ,{'class':'rank RankNumber'})
        for rank in Rank:
            Rank.append(rank)


        Score = data.findAll('div' ,{'class':'rank RankNumber'})
        for score in Score:
            Score.append(score)

我无法获得title,Rank,Score的所有数据。 我不知道我是否确定了正确的标签。而且iam无法从列表排名中提取价值。

1 个答案:

答案 0 :(得分:0)

让您开始。首先,找到所有div.RankItem元素,然后在每个元素内找到标题,排名和得分。

from bs4 import BeautifulSoup
import requests

resp = requests.get('https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/2010')
soup = BeautifulSoup(resp.content , 'html.parser')
for i, item in enumerate(soup.find_all("div", {"class": "RankItem"})):
    title = item.find("h3", {"class": "MainLink"}).get_text().strip()
    rank = item.find("div", {"class": "RankNumber"}).get_text().strip()
    score = item.find("div", {"class": "score"}).get_text().strip()
    print(i+1, title, rank, score)