我试图遍历多个房地产代理网站,抓取代理商名称和手机号码。
我的代码:
locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
my_url = 'https://' + location + '.ljhooker.com.au/our-team'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class":"team-details"})
for container in containers:
agent_name = container.findAll("div", {"class":"team-name"})
name = agent_name[0].text
phone = container.findAll("span", {"class":"phone"})
mobile = phone[0].text
print("name: " + name)
print("mobile: " + mobile)
然而,当我运行我的脚本时,它会跳过前三个网页(woollahra,chinatown,bondibeach)并且只从列表中的最后一个网站(doublebay)抓取信息。我不确定为什么要这样做或如何让它循环遍历所有网页。
答案 0 :(得分:1)
你应该在第一个循环中包含所有代码,否则循环只会更改变量my_url
。所以你要做的就是缩进代码的其余部分:
locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
my_url = 'https://' + location + '.ljhooker.com.au/our-team'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class":"team-details"})
for container in containers:
agent_name = container.findAll("div", {"class":"team-name"})
name = agent_name[0].text
phone = container.findAll("span", {"class":"phone"})
mobile = phone[0].text
print("name: " + name)
print("mobile: " + mobile)