Question

我正在使用请求模块查看单词列表中的项目是https://www.britannica.com上的文章。我目前的代码是：

import requests

words = ['no', 'yes', 'thermodynamics', 'london', 'Max-Factor', 'be']

for word in words:
    request = requests.head('https://www.britannica.com/topic/' + word.lower())
    if request.status_code == 200:
        print(">EXISTS")
        print('https://www.britannica.com/topic/' + word.lower())
        print("<")
    else:
        print(">DOESNT EXIST")
        print('https://www.britannica.com/topic/' + word.lower())
        print("<")

'Be'是打印'EXIST'的唯一字符串，但'thermodynamics'，'london'和'Max-Factor'也存在，程序会打印'DOESNT EXIST'。如果我单独对热力学进行操作，它会正确打印“EXISTS”。差异的原因和可能的解决方法是什么？可能是各种网页的加载时间（'Be'最小）？

Answer 1

显然，britanica.com使用重定向，可能用于负载平衡，因此您经常会获得状态301而不是200。如果您使用以下内容，requests模块可以遵循重定向：

request = requests.head('https://www.britannica.com/topic/' + word.lower(),
                        allow_redirects=True)

'request'模块在列表中包含网页时不会正确ping网页

1 个答案: