Question

我创建了一个快速的python程序，它返回URL最终目标的标题。

def get_title(url):
    try:
        req = urllib2.Request(url) 
        soup = BeautifulSoup(urllib2.urlopen(req))
        return soup.title.string.encode('ascii', 'ignore').strip().replace('\n','')
    except:
        print('Generic Exception for ' + url + ', ' + traceback.format_exc())

此代码工作正常，但其中一个网址具有通过window.location完成的重定向，因此我的脚本无法遵循该路径。是否有一种简单的方法可以让它遵循window.location重定向？

Answer 1

我最终使用RegEx来匹配window.location并提取网址

def get_title(url):
    try:
        req = urllib2.Request(url) 
        soup = BeautifulSoup(urllib2.urlopen(req))
        redirMatch = re.match(r'.*?window\.location\s*=\s*\"([^"]+)\"', str(soup), re.M|re.S)
        if(redirMatch and "http" in redirMatch.group(1)):
            url = redirMatch.group(1)
            return get_title(url)
        else:
            return soup.title.string.encode('ascii', 'ignore').strip().replace('\n','')

Python遵循Window.Location Redirect

1 个答案: