Question

我想使用Python根据其响应代码检查文件/网页是否存在并采取相应措施。但是，我要求使用HTTPS并提供用户名和密码凭据。我无法通过curl运行（不像HTTPS），但通过使用wget（使用--spider和--user和--password）获得了成功。我想我可以尝试通过os.system将wget合并到脚本中，但它打印出很多输出，解析起来非常棘手，如果URI不存在（又名404），我认为会卡住等待＃34;等待响应..＆＃34 ;.

我已经看过网络上的urllib2并看到人们做了一些事情，但我不确定这是否解决了我的情况并且解决方案总是非常错综复杂（例如{{3 }）。无论如何，如果我可以得到一些关于我最容易使用python的途径的指导，那将是值得赞赏的。

编辑：使用os.system方法（并为wget提供＆＃34; -q＆＃34;）如果URI存在或不存在，似乎返回不同的数字，因此这给了我一些可以使用的东西。

Answer 1

您可以使用python requests发出HEAD请求。

import requests
r = requests.head('http://google.com/sjklfsjd', allow_redirects=True, auth=('user', 'pass'))
assert r.status_code != 404

如果请求失败并显示ConnectionError，则该网站不存在。如果您只想检查某个页面是否存在，您将获得成功的响应，但状态代码为404。

请求有一个非常好的界面，所以我建议检查出来。您可能会喜欢它，因为它非常直观且功能强大（轻量级）。

Answer 2

urllib2是打开任何网页的方式

urllib2.urlopen('http://google.com')

要获得更多功能，您需要一个带处理程序的开启工具。我估计你只需要https，因为你几乎没有提取任何信息

opener = urllib2.build_opener(
    urllib2.HTTPSHandler())
opener.open('https://google.com')

添加数据，它会自动成为POST请求，或者我相信：

opener.open('https://google.com',data="username=bla&password=da")

您将收到的对象将具有code属性。

这是它的基本要点，添加尽可能多的处理程序，我相信它们不会受到伤害。来源：https://docs.python.org/2.7/library/urllib2.html#httpbasicauthhandler-objects

Answer 3

您应该使用urllib2来检查：

import urllib2, getpass
url = raw_input('Enter the url to search: ')
username = raw_input('Enter your username: ')
password = getpass.getpass('Enter your password: ')
if not url.startswith('http://') or not url.startswith('https://'):
        url = 'http://'+url

def check(url):
        try:
                urllib2.urlopen(url)
                return True
        except urllib2.HTTPError:
                return False

if check(url):
        print 'The webpage exists!'
else:
        print 'The webpage does not exist!'

opener = urllib2.build_opener(
urllib2.HTTPSHandler())
opener.open(url,data="username=%s&password=%s" %(username, password))

运行如下：

bash-3.2$ python url.py
Enter the url to search: gmail.com
Enter your username: aj8uppal
Enter your password: 
The webpage exists!

Python - 检查文件/网页是否存在

3 个答案: