我仍然是Python世界的新秀。我正在尝试构建一个对我的日常工作有用的刮板。但是我被困在特定的地方:
我的目标是抓取房地产网站。我正在使用BeatifulSoup,并且设法在列表页面上获取参数没有问题。但是,当我进入“商品详情”页面时,我没有设法抓取任何数据。
我的代码:
from bs4 import BeautifulSoup
import requests
url = "https://timetochoose.co.ao/?search-listings=true"
headers = {'User-Agent': 'whatever'}
response = requests.get(url, headers=headers)
print(response)
data = response.text
print(data)
soup = BeautifulSoup(data, 'html.parser')
anuncios = soup.find_all("div", {"class": "grid-listing-info"})
for anuncios in anuncios:
titles = anuncios.find("a",{"class": "listing-link"}).text
location = anuncios.find("p",{"class": "location muted marB0"}).text
link = anuncios.find("a",{"class": "listing-link"}).get("href")
anuncios_response = requests.get(link)
anuncios_data = anuncios_response.text
anuncios_soup = BeautifulSoup(anuncios_data, 'html.parser')
conteudo = anuncios_soup.find("div", {"id":"listing-content"}).text
print("Título", titles, "\nLocalização", location, "\nLink", link, "\nConteudo", conteudo)
示例:“ conteudo”变量下没有任何内容。我试图从“详细信息”页面获取不同的数据,例如“价格”或“房间数”,但它总是失败,我只会得到“无”。
自昨天下午以来,我一直在寻找答案,但是我没有找到失败的地方。我设法在首页上获取参数没有问题,但是当我到达列表详细信息页面级别时,它只是失败了。
如果有人可以指出我做错了什么,我将不胜感激。预先感谢您花时间阅读我的问题。
答案 0 :(得分:1)
要获取正确的页面,您需要设置User-Agent
http标头。
例如:
import requests
from bs4 import BeautifulSoup
main_url = 'https://timetochoose.co.ao/?search-listings=true'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
def print_info(url):
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print(soup.select_one('#listing-content').get_text(strip=True, separator='\n'))
soup = BeautifulSoup(requests.get(main_url, headers=headers).content, 'html.parser')
for a in soup.select('a.listing-featured-image'):
print(a['href'])
print_info(a['href'])
print('-' * 80)
打印:
https://timetochoose.co.ao/listings/loja-rua-rei-katiavala-luanda/
Avenida brasil , Rua katiavala
Maculusso
Loja com 90 metros quadrados
2 andares
1 wc
Frente a estrada
Arrendamento mensal 500.000 kz Negociável
--------------------------------------------------------------------------------
https://timetochoose.co.ao/listings/apertamento-t3-rua-cabral-montcada-maianga/
Apartamento T3 maianga
1 suíte com varanda
2 quartos com varanda
1 wc
1 sala comum grande
1 cozinha
Tanque de agua
Predio limpo
Arrendamento 350.000 akz Negociável
--------------------------------------------------------------------------------
...and so on.