Python3:BeautifulSoup4没有返回预期值

时间:2018-06-10 10:16:35

标签: python python-3.x web-scraping beautifulsoup

我目前正试图在python 3.6.4下使用BS4在网站上废弃一些数据,但返回的值不是我所期望的:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.content
soup = BeautifulSoup(page, "html5lib")

price = soup.find("div", {"class" : "fieldPrice sizeC"}).text

print(price)

我应该得到“39 900€”,但代码返回“47,880”。

注意:即使没有JS,数据也应该是“39 900€”。

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

此页面上的编码声明错误,因此BeautifulSoup被告知使用错误的编码。您可以强制它使用正确的编码,如下所示:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.content
soup = BeautifulSoup(page.decode('utf-8','ignore'), "html5lib")

price = soup.find("div", {"class": "fieldPrice sizeC"}).text

print(price)

输出:

49 070 €

答案 1 :(得分:1)

而不是page.content使用page.text

<强>实施例

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.text
soup = BeautifulSoup(page, "html.parser")

price = soup.find("div", {"class" : "fieldPrice sizeC"}).text

print(price)
  • .text自动解码服务器中的内容