500+网站检查Python的多种状态

时间:2016-03-22 05:04:25

标签: python python-2.7 python-3.x url http-headers

  • 我知道网址检查存在多个问题。我很新 python所以试图从多个帖子中理解并搜索 新图书馆也是如此。我正努力为以下几点努力 内部和外部网站。 :

       Status Code
       Status Description
       Response Length
       Time Taken 
       Websites are like ,, www.xyz.com , www.abc.log , www.abc.com/xxx/login.html and more combinations. Below is the
    

    初始代码..

    import socket
    from urllib2 import urlopen, URLError, HTTPError
    
    import urllib
    socket.setdefaulttimeout( 23 )  # timeout in seconds
    #print "---------URL----------", " ---Status Code---"
    url='https://www.google.com'
    
        try :
          response = urlopen( url )
        except HTTPError, e:
            print 'The server couldn\'t fulfill the request. Reason:', str(e.code)
            #Want to get code for that but its not showing
    
        except URLError, e:
            print 'We failed to reach a server. Reason:', str(e.reasonse)
            #Want to get code for that but its not showing
    
    
        else :
    
            code=urllib.urlopen(url).getcode()
            **#here getcode is working
            print url,"-------->", code
            #print 'got response!'
    
  • 我想首先检查网站是否存在。然后会去 如上所述的其他支票。如何组织这个工作 所有以上积分均为500多个网址。我是否需要从txt文件导入 ?还有一点,我看到如果www.xyx.com工作和 www.xyz.com/lmn.html不存在,仍然显示200。

1 个答案:

答案 0 :(得分:1)

我认为您可以使用此代码显示页面:

import httplib
from urlparse import urlparse

def chkUrl(url):
    p = urlparse(url)
    conn = httplib.HTTPConnection(p.netloc)
    conn.request('HEAD', p.path)
    resp = conn.getresponse()
    return resp.status < 400

if __name__ == '__main__':
    print chkUrl('http://www.stackoverflow.com') # True
    print chkUrl('http://stackoverflow.com/notarealpage.html') # False