Question

我想监视一个特定的URL，等到它使用python请求在内部将我重定向。该网站将在一段时间后随机重定向我。但是，我现在遇到一些问题。到目前为止，我采用的策略是这样的：

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
} 

session = requests.Session()

while success is False:
    r = session.get(url, headers=headers, allow_redirects=True)
    if keyword in r.text:
        success = True
    time.sleep(30)

print("Success.")

似乎每次我发出GET请求时，计时器都会重置，因此我从不重定向，我以为会话可以解决此问题，但也许不能解决。虽然，我的意思是不每30秒发送一次新请求就检查页面更改吗？在Chrome浏览器的“网络”标签中，状态代码似乎是307。

如果有人知道如何解决此问题，将非常有帮助，谢谢。

Answer 1

硒是一个快速而丑陋的答案：

from selenium import webdriver

profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36")

browser = webdriver.Firefox(profile)
browser.get(url)

while success is False:
    text = browser.page_source
    if keyword in text:
        success = True
    time.sleep(30)

print("Success.")

就使用请求而言，我很可能会猜测您的Web浏览器正在请求重新加载，网络中的请求是否与初始请求有所不同？ browsermob-proxy是深入研究这类问题的好工具，实际上是类固醇上的网络标签。

为上半场的模糊性表示歉意，但如果不查看该网站，很难说更多。

监视网站的内部重定向

1 个答案: