Question

我正在尝试从以下网站中提取数据：https://www.centris.ca/en/multi-family-properties~for-sale~montreal-island?view=List。
尽管我已经能够从第一页获得所需的数据，但我仍在努力为所有页面创建循环，因为看起来好像没有提及我可以在链接中使用的“页面”。取而代之的是，无论您要转到哪个页面，该链接似乎都保持不变。

非常感谢您的帮助:)
谢谢！

这是我到目前为止的代码，似乎可以在一页上提取数据，但是我试图找到一种方法来循环单击它，以在单击“下一页”后自动加载数据：

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.centris.ca/en/multi-family-properties~for-sale~montreal-island?view=List'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each product
descriptions = page_soup.findAll("div",{"class":"description"})

filename = "houses-v2.csv"
f = open(filename, "w")

headers = "pgr, price\n"

f.write(headers)

for description in descriptions:
# description = descriptions[0]

pgr = description.p.span.span.text.strip()

price_description = description.findAll("p", {"class":"price"})
price = price_description[0].text.strip()

print(pgr)
print(price)

f.write(pgr.replace(",", "") + ',' + price.replace(",", "") + '\n')

f.close()

Answer 1

我不知道您是否成功进行了报废，但我一直在调查这个特殊情况。

您将必须使用selenium软件包，可以使用pip（pip install selenium）或conda（conda install selenium）来安装

此软件包用于Web浏览器自动化。更多信息here。

您将可以打开与获取请求类似的会话。然后虚拟点击下一页按钮。从那里您应该能够从其他页面获取数据。我还建议您重写BeautifulSoup中的Selenium部分，以确保代码的一致性。

您可以找到示例here！有趣的部分是：

# this is navigate to next page
driver.find_element_by_xpath('//ul[@class="pagination"]/li').click()

在“驱动程序”为会话的地方，“ find_element_by_xpath”将类似于BeautifulSoup中的“ find”和“ find_all”，而“ click”功能将使您激活html元素。剩下的唯一一件事就是找到与下一页按钮链接的html元素！

让我发布！

Python Web抓取多个页面只有一个静态链接

1 个答案: