美丽汤对现有元素不返回任何值

时间:2019-12-17 15:57:04

标签: python beautifulsoup

我正在努力弄清产品的价格。这是我的代码:

from bs4 import BeautifulSoup as soup
import requests

page_url = "https://www.falabella.com/falabella-cl/product/5311682/Smartphone-iPhone-7-PLUS-32GB/5311682/"
headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
uClient = requests.get(page_url, headers=headers)
print(uClient)
page_soup = soup(uClient.content, "html.parser") #requests
test = page_soup.find("p", {"class":"fb-price"})
print(test)

但是我得到以下答复,而不是期望的价格

<Response [200]>
None

我已经使用Chrome开发者工具检查了该元素是否存在。网址:https://www.falabella.com/falabella-cl/product/5311682/Smartphone-iPhone-7-PLUS-32GB/5311682/

3 个答案:

答案 0 :(得分:3)

如果您转到network tab,则会获得以下链接,该链接以json格式检索数据。您可以在没有Selenium和Beautifulsoup的情况下做到这一点

  

Url =“ https://www.falabella.com/rest/model/falabella/rest/browse/BrowseActor/fetch-item-details?{%22products%22:[{%22productId%22:%225311634% 22},{%22productId%22:%225311597%22},{%22productId%22:%225311505%22},{%22productId%22:%226009874%22},{%22productId%22:%225311494%22} ,{%22productId%22:%225311510%22},{%22productId%22:%226009845%22},{%22productId%22:%226009871%22},{%22productId%22:%226009868%22},{ %22productId%22:%226009774%22},{%22productId%22:%226782957%22},{%22productId%22:%226009783%22},{%22productId%22:%226782958%22},{%22productId %22:%228107608%22},{%22productId%22:%228107640%22},{%22productId%22:%226009875%22},{%22productId%22:%226782967%22},{%22productId%22 :%226782922%22}]}“

尝试以下代码。

import requests

page_url = "https://www.falabella.com/rest/model/falabella/rest/browse/BrowseActor/fetch-item-details?{%22products%22:[{%22productId%22:%225311634%22},{%22productId%22:%225311597%22},{%22productId%22:%225311505%22},{%22productId%22:%226009874%22},{%22productId%22:%225311494%22},{%22productId%22:%225311510%22},{%22productId%22:%226009845%22},{%22productId%22:%226009871%22},{%22productId%22:%226009868%22},{%22productId%22:%226009774%22},{%22productId%22:%226782957%22},{%22productId%22:%226009783%22},{%22productId%22:%226782958%22},{%22productId%22:%228107608%22},{%22productId%22:%228107640%22},{%22productId%22:%226009875%22},{%22productId%22:%226782967%22},{%22productId%22:%226782922%22}]}"
headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
response=requests.get(page_url, headers=headers)
res=response.json()
for item in res['products'][0]['product']['prices']:
    print(item['symbol'] + item['originalPrice'])

输出

$ 379.990
$ 569.990

要获取产品名称:

print(res['products'][0]['product']['displayName'])

输出:

Smartphone iPhone 7 PLUS 32GB

如果您只寻找value $ 379.990印刷品

print(res['products'][0]['product']['prices'][0]['symbol'] +res['products'][0]['product']['prices'][0]['originalPrice'] )

答案 1 :(得分:2)

问题是页面加载后,JS脚本正在动态插入此HTML节点。该请求仅检索原始HTML,而不会等待脚本运行。

您可以使用无头浏览器,例如Chrome Webdriver,它可以等待页面实时加载,然后动态访问DOM。以下是在installing it之后如何使用此代码的示例:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

url = "https://www.falabella.com/falabella-cl/product/5311682/Smartphone-iPhone-7-PLUS-32GB/5311682/"
opts = Options()  
opts.add_argument("--headless")  
opts.add_argument("log-level=3") # suppress console noise
driver = webdriver.Chrome(options=opts)
driver.get(url)

print(driver.find_element_by_class_name("fb-price").text) # => $ 379.990

正如other answer中指出的那样,另一个不错的选择是对脚本用来访问数据的URL进行相同的API调用。使用此方法无需安装或导入任何东西,因此它非常轻巧,并且API的脆弱性可能不如类名(反之亦然)。

答案 2 :(得分:1)

这是非常hacky,对于实际用例,我建议使用:Web-scraping JavaScript page with Python


通过通过cURL下载原始HTML并使用grep(在您的情况下,您可以在资源管理器的“源”选项卡中对源进行搜索),我发现价格存储在{{1}中}变量。使用BeautifulSoup,我能够为其提取脚本:

fbra_browseMainProductConfig

从那里,我检查了输出,发现第一行是import requests, re from bs4 import BeautifulSoup soup = BeautifulSoup(requests.get("https://www.falabella.com/falabella-cl/product/5311682/Smartphone-iPhone-7-PLUS-32GB/5311682/").content) # grab the text where it has `fbra_browseMainProductConfig` in it, and strip the extra whitespace script_contents = soup(text=re.compile("fbra_browseMainProductConfig"))[0].strip() 声明。所以:

fbra_browseMainProductConfig

希望这会有所帮助!