Python Web scrape:类的问题

时间:2017-08-02 04:53:21

标签: python web-scraping beautifulsoup

我试图从此website中删除房地产经纪人的姓名。

我的代码:

containers = page_soup.findAll("div",{"class":"team-details"})

for container in containers:
    agent_name = container.findAll("a", {"class":"team-name_link"})
    name = agent_name[0].text


    print("name: " + name)

但是,当我运行脚本时,我只收到前两个名字,后跟一条错误消息:

name: Michael Stavrianos
name: Kristalla Stavrianos
Traceback (most recent call last):
  File "C:\Users\Toby\Desktop\Webscrape\LjHooker - mark1.py", line 16, in <module>
    name = agent_name[0].text
IndexError: list index out of range

我发现前两个代理名称属于“team-name_link”类,但其余的属于“team-name”类。我不确定如何同时从两组课程中删除名字。

1 个答案:

答案 0 :(得分:2)

我认为你弄错了,所有名字都在所需的标签内,但你实际上需要寻找div

from bs4 import BeautifulSoup
import requests

html = requests.get("https://woollahra.ljhooker.com.au/our-team").text
soup = BeautifulSoup(html, 'html.parser')
containers = soup.findAll("div",{"class":"team-details"})

for container in containers:
    agent_name = container.find("div", {"class":"team-name"})
    name = agent_name.text
    print(name)

以上代码输出:

Michael Stavrianos
              Licensee



Kristalla Stavrianos
              Principal



Jade Marshall
              Property Management Associate


Emma Phelan
              Property Management Associate


Isabella Marechal - Ross
              Property Management Associate


Victoria Empson
              Property Investment Manager