BeautifulSoup返回无

时间:2018-10-13 07:23:54

标签: python selenium beautifulsoup

我正在尝试获取this URL中列表的标题,但是此代码返回None。

import requests 
from bs4 import BeautifulSoup  

# get the data 
data = requests.get('https://www.lamudi.com.ph/metro-manila/makati/condominium/buy/')

# Update Header
headers = requests.utils.default_headers()
headers.update({
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:31.0) 
Gecko/20100101 Firefox/31.0',
})
# load data into bs4
soup = BeautifulSoup(data.text, 'html.parser')

# We need to extract all the data in this div: <div 
class="ListingCell-KeyInfo-title" ..>

listingsTitle = soup.find('div', { 'class': 'ListingCell-KeyInfo-title'})
print(listingsTitle)

有人知道为什么吗?

谢谢

2 个答案:

答案 0 :(得分:0)

您请求的网址将您视为漫游器。

请求响应:

h1>Pardon Our Interruption...</h1>
<p>
      As you were browsing <strong>www.lamudi.com.ph</strong> something about your 
browser made us think you were a bot. There are a few reasons this might happen:
        </p>
<ul>

在解析响应中的任何内容之前。

首先打印内容,以确保您以正确的方式访问了网址。

您必须添加User-Agent或其他东西才能使您成为真实用户

尝试将其添加到您的请求标头中:

USER_AGENT_FIREFOX= 'Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0'

答案 1 :(得分:0)

我尝试了硒测试,并经过了特定的等待,但是没有用。 如果打印汤,则可能会出错。实际上,页面返回以下内容:”当您浏览 www.lamudi.com.ph 时,有关您的浏览器的某些信息使我们认为您是机器人。可能有以下几种原因: ...“

该网站认识到您不是人类。

import requests 
from bs4 import BeautifulSoup  

# get the data 
data = requests.get('https://www.lamudi.com.ph/metro-manila/makati/condominium/buy/')

# load data into bs4
soup = BeautifulSoup(data.text, 'html.parser')

# We need to extract all the data in this div: <div class="ListingCell-KeyInfo-title" ..>
print(soup)    #--> this print get the error

listingsTitle = soup.find('div', class_='ListingCell-KeyInfo-title')
print(listingsTitle)