我不熟悉网页抓取并尝试抓取房地产网站。我在以下查询中得到一个空列表。
import requests
from bs4 import BeautifulSoup
base_url="https://juddwhite.harcourts.com.au/Property/Rentals?location=8046&data=%7b%22locationOption%22%3a%7b%22value%22%3a8046%2c%22name%22%3a%5b%22Glen+Waverley%22%2c%22Melbourne+-+Eastern+Melbourne%22%2c%22Victoria%22%5d%7d%7d&page="
for page in range (1,4,1):
r=requests.get(base_url+str(page))
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all("div",{"class":"search-item-content"})
for item in all:
for bond in item.find_all("div",{"class":"list-feature hc-text hc-grid-4 hc-grid-sm-8"}):
print(bond)
你们能看看吗? 谢谢。
答案 0 :(得分:1)
要从各个属性获取债券价值,可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = 'https://juddwhite.harcourts.com.au/Property/Rentals?location=8046&data=%7b%22locationOption%22%3a%7b%22value%22%3a8046%2c%22name%22%3a%5b%22Glen+Waverley%22%2c%22Melbourne+-+Eastern+Melbourne%22%2c%22Victoria%22%5d%7d%7d&page={page}'
for page in range(1, 3):
print('Getting page {}..'.format(page))
soup = BeautifulSoup( requests.get(url.format(page=page)).content, 'html.parser' )
for a in soup.select('.search-item-container a'):
soup2 = BeautifulSoup(requests.get('https://juddwhite.harcourts.com.au' + a['href']).content, 'html.parser')
print(soup2.select_one('.hc-title').get_text(strip=True))
print(soup2.find('span', text='Bond $: ').find_next('div').get_text(strip=True))
print('-' * 80)
打印:
Getting page 1..
Glen Waverley, 10 Snowden Drive VGW52967
1,955
--------------------------------------------------------------------------------
Glen Waverley, 1/2 Garrison Drive VGW52955
2,825
--------------------------------------------------------------------------------
Glen Waverley, 614/39 Kingsway VGW52953
1,739
--------------------------------------------------------------------------------
Glen Waverley, 40 Elmwood Crescent VGW52949
2,521
--------------------------------------------------------------------------------
...and so on.
答案 1 :(得分:0)
Kuddos尝试了一些新的尝试:)。您的代码看起来正确。我在for循环中的该页面的“检查元素”选项中对文本“ list-feature hc-text hc-grid-4 hc-grid-sm-8”进行了简单搜索,没有返回结果。也许您想检查要在网页上查找的确切属性名称。希望这些信息对您有所帮助。