BeautifulSoup-刮多页

时间:2019-09-14 05:09:22

标签: python web-scraping beautifulsoup

我想从每个页面上抓取成员的姓名,然后转到下一页并执行相同的操作。我的代码仅适用于一页。我对此很陌生,任何建议将不胜感激。谢谢。

    import requests
    from bs4 import BeautifulSoup

    r = requests.get("https://www.bodia.com/spa-members/page/1")
    soup = BeautifulSoup(r.text,"html.parser")
    lights = soup.findAll("span",{"class":"light"})

    lights_list = []
    for l in lights[0:]:
        result = l.text.strip()
        lights_list.append(result)

    print (lights_list)

我尝试过此操作,它只给了我第3页的成员。

    for i in range (1,4): #to scrape names of page 1 to 3
    r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i))
soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
    result = l.text.strip()
    lights_list.append(result)

print (lights_list)

然后我尝试了这个:

i = 1
while i<5:
    r = requests.get("https://www.bodia.com/spa-members/page/"+str(i))
i+=1

soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
    result = l.text.strip()
lights_list.append(result)

print (lights_list)

它给了我4个成员的名字,但我不知道从哪个页面开始

['Seng Putheary (Nana)']
['Marco Julia']
['Simon']
['Ms Anne Guerineau']

1 个答案:

答案 0 :(得分:2)

只需进行两项更改即可刮擦所有内容。

  1. r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i))需要更改为r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i))。您使用的格式不正确。

  2. 您没有遍历所有代码,因此结果是它只打印出一组名称,然后无法返回到循环的开始。缩进for循环下的所有内容都可以解决该问题。

import requests
from bs4 import BeautifulSoup

for i in range (1,4): #to scrape names of page 1 to 3
    r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i))
    soup = BeautifulSoup(r.text,"html.parser")
    lights = soup.findAll("span",{"class":"light"})
    lights_list = []
    for l in lights[0:]:
        result = l.text.strip()
        lights_list.append(result)

    print(lights_list)

上面的代码每3秒钟为抓取的页面吐出一个名称列表。