使用BeautifulSoup从网络抓取中获取空列表

时间:2020-10-02 00:37:21

标签: python web-scraping beautifulsoup

我不熟悉网页抓取并尝试抓取房地产网站。我在以下查询中得到一个空列表。

import requests

from bs4 import BeautifulSoup

base_url="https://juddwhite.harcourts.com.au/Property/Rentals?location=8046&data=%7b%22locationOption%22%3a%7b%22value%22%3a8046%2c%22name%22%3a%5b%22Glen+Waverley%22%2c%22Melbourne+-+Eastern+Melbourne%22%2c%22Victoria%22%5d%7d%7d&page="

for page in range (1,4,1):
   
    r=requests.get(base_url+str(page))

    c=r.content

    soup=BeautifulSoup(c,"html.parser")
    all=soup.find_all("div",{"class":"search-item-content"})
    for item in all:
        for bond in item.find_all("div",{"class":"list-feature hc-text hc-grid-4 hc-grid-sm-8"}):
            print(bond)

你们能看看吗? 谢谢。

2 个答案:

答案 0 :(得分:1)

要从各个属性获取债券价值,可以使用以下示例:

import requests
from bs4 import BeautifulSoup


url = 'https://juddwhite.harcourts.com.au/Property/Rentals?location=8046&data=%7b%22locationOption%22%3a%7b%22value%22%3a8046%2c%22name%22%3a%5b%22Glen+Waverley%22%2c%22Melbourne+-+Eastern+Melbourne%22%2c%22Victoria%22%5d%7d%7d&page={page}'

for page in range(1, 3):
    print('Getting page {}..'.format(page))

    soup = BeautifulSoup( requests.get(url.format(page=page)).content, 'html.parser' )

    for a in soup.select('.search-item-container a'):
        soup2 = BeautifulSoup(requests.get('https://juddwhite.harcourts.com.au' + a['href']).content, 'html.parser')

        print(soup2.select_one('.hc-title').get_text(strip=True))
        print(soup2.find('span', text='Bond $: ').find_next('div').get_text(strip=True))
        print('-' * 80)

打印:

Getting page 1..
Glen Waverley, 10 Snowden Drive VGW52967
1,955
--------------------------------------------------------------------------------
Glen Waverley, 1/2 Garrison Drive VGW52955
2,825
--------------------------------------------------------------------------------
Glen Waverley, 614/39 Kingsway VGW52953
1,739
--------------------------------------------------------------------------------
Glen Waverley, 40 Elmwood Crescent VGW52949
2,521
--------------------------------------------------------------------------------

...and so on.

答案 1 :(得分:0)

Kuddos尝试了一些新的尝试:)。您的代码看起来正确。我在for循环中的该页面的“检查元素”选项中对文本“ list-feature hc-text hc-grid-4 hc-grid-sm-8”进行了简单搜索,没有返回结果。也许您想检查要在网页上查找的确切属性名称。希望这些信息对您有所帮助。

相关问题