Session.get没有打开正确的页面?

时间:2016-06-29 15:42:11

标签: python web-scraping python-requests

我试图在python脚本中打开一个链接。

https://www.amazon.com/Best-Sellers-Automotive-Transmission-Fluid-Additives/zgbs/automotive/15718891/ref=zg_bs_nav_auto_4_15718881#2

当我复制到我的浏览器时会显示正确的页面。但是,当我使用python打开链接时,它只是转到

https://www.amazon.com/Best-Sellers-Automotive-Transmission-Fluid-Additives/zgbs/automotive/15718891/ref=zg_bs_nav_auto_4_15718881

我尝试使用带有几个不同标头的会话,我也尝试使用requests.get。我只是使用错误的标题?我正在查看源代码,当你点击按钮从第一页到第二页时,除了href之外还有一个ajax url所以我认为这可能是我出错的地方?

CODE:

group_link = 'https://www.amazon.com/Best-Sellers-Automotive-Transmission-Fluid-Additives/zgbs/automotive/15718891/ref=zg_bs_nav_auto_4_15718881'

session.headers.update({'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36',
'Accept':'text/html,application/json, text/javascript, application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language':'en-US,en;q=0.8,es;q=0.6'})

link_2 = str(group_link) + "#2"
page_2 = session.get(link_2)
soup_2 = BeautifulSoup(page_2.text)

1 个答案:

答案 0 :(得分:0)

您需要传递某些参数,因为使用ajax请求检索内容:

params = {"_encoding": "UTF8",
          "pg": "2",
          "ajax": "1"}

url = "https://www.amazon.com/Best-Sellers-Automotive-Transmission-Fluid-Additives/zgbs/automotive/15718891"

r = requests.get(url, params=params)
print(r.text)

一旦你这样做,你会看到你得到了正确的来源。您只需要Best-Sellers-Automotive-Transmission-Fluid-Additives/zgbs/automotive/15718891以及基本的亚马逊网址。由于this,它无法正确显示,如果您点击编辑,则可以看到正确的网址