Question

我想抓下一页

html='https://www.quintoandar.com.br/alugar/imovel/sao-paulo-sp-brasil/1-vagas/de-20-a-75-m2/de-500-a-4400-reais/apartamento')

为了获得租金价格，总价值和位置，每张照片下面存储三行文字。

我试过

import requests
from bs4 import BeautifulSoup

page=requests.get(html)
soup = BeautifulSoup(page.content, 'html.parser')

for tag in soup.findAll('div'):
    if tag.has_attr('class'):
        span=tag.findAll('span')
        print(span.text)

我的目的是使用class属性进入div标签，找到里面的span类，然后获取它们的文本。这就是对html的检查所表明的。

但是，我没有得到任何东西。似乎没有任何div标签。

任何线索？

Answer 1

此站点通过this one等JSON API请求获取显示的信息。事实上，解析比HTML更容易，例如：

swapcontext(thread_list[last_id],thread_list[current_id]);

在浏览器中使用开发人员工具（由F12打开）以检查网络活动。

从真实状态网站

1 个答案: