检查在Python脚本中URL输入是否处于活动状态

时间:2013-12-10 17:19:55

标签: python validation url

我这里有一个python web抓取工具脚本,我需要通过测试网站连接来验证url是否是现有网站。任何人都可以帮我在我的代码中实现这个吗?

这是我的代码:

import sys, urllib

while True:
    try:
        url= raw_input('Please input address: ')
        webpage=urllib.urlopen(url)
        print 'Web address is valid'
        break
    except:
        print 'No input or wrong url format usage: http://wwww.domainname.com/ '
        print 'Please try again'
def wget(webpage):
        print '[*] Fetching webpage...\n'
        page = webpage.read()
        return page      
def main():
    sys.argv.append(webpage)
    if len(sys.argv) != 2:
        print '[-] Usage: webpage_get URL'
        return
    print wget(sys.argv[1])

if __name__ == '__main__':
    main()

修改: 我在这里有一个代码,我从另一个stackoverflow帖子中提取。这段代码有效,我只想将它集成到我的代码中。我试图整合自己,但却得到了错误。任何人都可以帮我这样做吗? 这是代码:

from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
    response = urlopen(req)
except URLError, e:
    if hasattr(e, 'reason'):
        print 'We failed to reach a server.'
        print 'Reason: ', e.reason
    elif hasattr(e, 'code'):
        print 'The server couldn\'t fulfill the request.'
        print 'Error code: ', e.code
else:
    print 'URL is good!'

2 个答案:

答案 0 :(得分:1)

也许这段代码可以帮助您理解main之后执行while的原因:

print 'Checkpoint Alpha'

while True:
    print 'Checkpoint Bravo'
    if raw_input ('x for break: ') == 'x': break

print 'Checkpoint Charlie'

def main():
    print 'Checkpoint Foxtrott'

print 'Checkpoint Delta'

if __name__ == '__main__':
    print 'Checkpoint Echo'
    main()
    print 'Checkpoint Golf'

print 'Checkpoint Hotel'

答案 1 :(得分:0)

以下应该可以帮到你 -

visited = []

in while loop - 
in try:
    url= raw_input('Please input address: ')
    if url in visited: 
        print "Already visited. Continue"
    visited.append(url)
    webpage=urllib.urlopen(url)
    [...]