Question

我正在学习网络抓取，我一直在努力编写一个程序，从Steam's website中提取信息作为练习。

我想编写一个程序，只访问每个前10名畅销游戏的页面并提取一些内容，但我的程序在尝试访问M级游戏时会被重定向到年龄检查页面。

我的程序看起来像这样：

front_page = urlopen('http://store.steampowered.com/').read()
bs = BeautifulSoup(front_page, 'html.parser')
top_sellers = bs.select('#tab_topsellers_content a.tab_item_overlay')

for item in top_sellers:
    game_page = urlopen(item.get('href'))
    bs = BeautifulSoup(game_page.read(), 'html.parser')
    #Now I'm on the age check page :(

我不知道如何通过年龄检查，我已经尝试通过发送POST请求来填写年龄检查：

post_params = urlencode({'ageDay': '1', 'ageMonth': 'January', 'ageYear': '1988', 'snr': '1_agecheck_agecheck__age-gate'}).encode('utf-8')
page = urlopen(agecheckurl, post_params)

但它不起作用，我还在年龄检查页面上。任何可以帮助我的人，我怎么能超越它？

Answer 1

好吧，似乎Steam使用cookies来保存年龄检查结果。它使用birthtime。

由于我不知道如何使用urllib设置Cookie，以下是使用requests的示例：

import requests
cookies = {'birthtime': '568022401'}
r = requests.get('http://store.steampowered.com/', cookies=cookies)

现在没有年龄检查。

Answer 2

我喜欢使用Selenium Webdriver进行表单输入，因为它是点击和击键的简单解决方案。您可以在“填写并提交表单”中查看文档或查看示例。

https://automatetheboringstuff.com/chapter11/

Python美丽的汤 - 通过Steam的年龄检查

2 个答案: