Requests.get返回代码,bs4提供空列表

时间:2019-07-31 23:03:20

标签: python web-scraping beautifulsoup amazon

我尝试解析以下页面:https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1

Requests.get可以获取全部代码,但是当我尝试使用Beautiful Soup解析它时,它将返回一个空列表[]。

我已经试过编码,采用铬,请求-HTML,不同的解析器,替代代码的开始,等我很伤心地说,似乎没有任何工作。

from fake_useragent import UserAgent
from lxml import html
import requests
from bs4 import BeautifulSoup as soup

url = "https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1"

userAgentList = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
    'Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
    'Mozilla/5.0 (Windows NT 5.1; rv:36.0) Gecko/20100101 Firefox/36.0',
]

proxyList = [
    'xxx.x.x.xxx:8080',
    'xx.xx.xx.xx:3128',
]

def make_soup_am(url):
    print(url)
    random.shuffle(proxyList)
    s = requests.Session()
    s.proxies = proxyList
    headers = {'User-Agent': random.choice(userAgentList)}
    pageHTML = s.get(url, headers=headers).text
    pageSoup = soup(pageHTML, features='lxml')
    return pageSoup

make_soup_am()

有人有主意吗?

预先感谢

汤姆

0 个答案:

没有答案