我正在向此网站https://www.everything5pounds.com/en/Womens/c/womens#/?q=&sort=newArrivals发送GET请求,并且得到的响应是页面来源(与该浏览器呈现的内容相同)
但是当我在Chrome中使用“网络”标签时,我看到的URL响应为JSON。奇怪的是尽管使用“ accept”:“ application / json”,但我仍无法获得JSON响应。
以下是我正在使用的代码。
import requests
from bs4 import BeautifulSoup
headers = requests.utils.default_headers()
headers.update({
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0',
'accept':'application/json'
})
url = 'https://www.everything5pounds.com/en/Womens/c/womens#/?q=&sort=newArrivals'
response = requests.get(url)
content = BeautifulSoup(response.content,'lxml')
print(content)
如果我做错了什么,请纠正我,否则请解释原因。
答案 0 :(得分:1)
您的网址不正确:
import json
import requests
from pprint import pprint
url = 'https://www.everything5pounds.com/en/Womens/c/womens/results/?q=&sort=newArrivals'
data = json.loads(requests.get(url).text)
# You can get json also directly, no need to import json library:
# data = requests.get(url).json()
pprint(data)
打印:
{'currentQuery': ':newArrivals',
'pagination': {'currentPage': 0,
'numberOfPages': 458,
'pageSize': 24,
'sort': 'newArrivals',
'totalNumberOfResults': 10973},
'results': [{'availableForPickup': None,
'availableInCurrentStore': None,
'averageRating': 5.0,
'badgeCode': None,
'badgeUrl': None,
'baseOptions': None,
'baseProduct': None,
'baseProductUrl': None,
'categories': None,
'categoryUrl': None,
'classifications': None,
'cleanUrl': '/Tie-Up-Cold-Shoulder-Dip-Hem-Dress/p/659773',
'code': '659773',
...and so on.