我正在尝试编写一个脚本,检查是否存在多个网址:
import httplib
with open('urls.txt') as urls:
for url in urls:
connection = httplib.HTTPConnection(url)
connection.request("GET")
response = connection.getresponse()
if response.status == 200:
print '[{}]: '.format(url), "Up!"
但我收到了这个错误:
Traceback (most recent call last):
File "test.py", line 5, in <module>
connection = httplib.HTTPConnection(url)
File "/usr/lib/python2.7/httplib.py", line 693, in __init__
self._set_hostport(host, port)
File "/usr/lib/python2.7/httplib.py", line 721, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: '//globo.com/galeria/amazonas/a.html
怎么了?
答案 0 :(得分:17)
这可能是一个简单的解决方案,在这里
connection = httplib.HTTPConnection(url)
您使用的是httpconnection
,因此无需提供类似于http://iGyan.org的网址,但您需要提供iGyan.org。
简而言之,从您的网址中删除http://
和https://
,因为httplib
正在考虑:
作为端口号,并且端口号必须是数字,
希望这有帮助!
答案 1 :(得分:6)
httplib.HttpConnection
在其构造函数中获取远程网址的host
和port
,而不是整个网址。
对于您的使用案例,使用urllib2.urlopen
更容易。
import urllib2
with open('urls.txt') as urls:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
print '[{}]: '.format(url), "Up!"
elif r.code == 404:
print '[{}]: '.format(url), "Not Found!"
答案 2 :(得分:0)
非数字端口:
解决方案:
http.client.HTTPSConnection(“ api.cognitive.microsofttranslator.com”)
从服务URL或端点中删除“ https:// ”,它将起作用。
https://appdotpy.wordpress.com/2020/07/04/errorsolved-nonnumeric-port/