我尝试解析以下页面:https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1。
Requests.get可以获取全部代码,但是当我尝试使用Beautiful Soup解析它时,它将返回一个空列表[]。
我已经试过编码,采用铬,请求-HTML,不同的解析器,替代代码的开始,等我很伤心地说,似乎没有任何工作。
from fake_useragent import UserAgent
from lxml import html
import requests
from bs4 import BeautifulSoup as soup
url = "https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1"
userAgentList = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
'Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
'Mozilla/5.0 (Windows NT 5.1; rv:36.0) Gecko/20100101 Firefox/36.0',
]
proxyList = [
'xxx.x.x.xxx:8080',
'xx.xx.xx.xx:3128',
]
def make_soup_am(url):
print(url)
random.shuffle(proxyList)
s = requests.Session()
s.proxies = proxyList
headers = {'User-Agent': random.choice(userAgentList)}
pageHTML = s.get(url, headers=headers).text
pageSoup = soup(pageHTML, features='lxml')
return pageSoup
make_soup_am()
有人有主意吗?
预先感谢
汤姆