Question

我用这个鳕鱼获得了一个网页，并且效果很好

但现在不起作用

我尝试了很多标头，但仍然收到403错误此鳕鱼适用于大多数网站，但我无法举例

def get_page(addr):
    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0"
    req = urllib.request.Request(addr, headers=headers)
    html = urllib.request.urlopen(req).read()
    return str(html)

Answer 1

尝试硒：

from selenium import webdriver
import os

# initialise browser
browser = webdriver.Chrome(os.getcwd() + '/chromedriver')
browser.get('https://www.fragrantica.com/perfume/Victorio-Lucchino/No-4- Evasion-Exotica-50418.html')

# get page html
html = browser.page_source

使用urllib进行Web爬网并修复403：禁止

1 个答案: