Question

我试图遍历多个房地产代理网站，抓取代理商名称和手机号码。

我的代码：

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div", {"class":"team-details"})

for container in containers:
    agent_name = container.findAll("div", {"class":"team-name"})
    name = agent_name[0].text

    phone = container.findAll("span", {"class":"phone"})
    mobile = phone[0].text

    print("name: " + name)
    print("mobile: " + mobile)

然而，当我运行我的脚本时，它会跳过前三个网页（woollahra，chinatown，bondibeach）并且只从列表中的最后一个网站（doublebay）抓取信息。我不确定为什么要这样做或如何让它循环遍历所有网页。

Answer 1

你应该在第一个循环中包含所有代码，否则循环只会更改变量my_url。所以你要做的就是缩进代码的其余部分：

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

    uClient = uReq(my_url)
    page_html = uClient.read()
    uClient.close()

    page_soup = soup(page_html, "html.parser")

    containers = page_soup.findAll("div", {"class":"team-details"})

    for container in containers:
        agent_name = container.findAll("div", {"class":"team-name"})
        name = agent_name[0].text

        phone = container.findAll("span", {"class":"phone"})
        mobile = phone[0].text

        print("name: " + name)
        print("mobile: " + mobile)

Web抓取Python - 通过多个网页循环的问题

1 个答案: