我的代码当前正在通过一个网站进行解析,并查看该网站上的任何链接是否都链接回另一个用户输入的网站。但是,在某些网站上,存在一些链接可以间接重定向到用户输入的网站。
例如,如果我要寻找麦当劳,在Yelp上,指向麦当劳网站的链接是
当我的程序正在寻找www.mcdonalds.com时。
另一个示例是bit.ly链接,它们间接重定向到网站。
这是我的代码供参考
ef search(web):
#Clicks on the site
site = web.get_attribute("href")
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[-1])
driver.get(site)
#Wait until the webpage has loaded
try:
start = datetime.now()
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "body")))
#Gets the parsed url
parsed_uri = urlparse(driver.current_url)
domain = strip('{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri))
#If it's taking too long, say it didn't load
if(tooLong(start, datetime.now())):
driver.execute_script("window.close('');")
driver.switch_to.window(driver.window_handles[0])
return site + " could not load"
#If it's the same website, return nothing to add to the list
if domain == URL:
driver.execute_script("window.close('');")
driver.switch_to.window(driver.window_handles[0])
return ''
#Else, search all links for the desired website
else:
elems = driver.find_elements_by_tag_name('a')
for i in elems:
newURI = urlparse(i.get_attribute("href"))
check = strip('{uri.scheme}://{uri.netloc}/'.format(uri=newURI))
print(check)
#If it links back to the website, return nothing to add to the list
if check == URL:
driver.execute_script("window.close('');")
driver.switch_to.window(driver.window_handles[0])
return ''
#If it's taking too long, say it couldn't load
if(tooLong(start, datetime.now())):
driver.execute_script("window.close('');")
driver.switch_to.window(driver.window_handles[0])
return site + " could not load"
#If nothing is found, return the name of the website
driver.execute_script("window.close('');")
driver.switch_to.window(driver.window_handles[0])
return site
#If the website doesn't load, flag it
except TimeoutException as ex:
driver.execute_script("window.close('');")
driver.switch_to.window(driver.window_handles[0])
return site + " could not load"
感谢您的帮助!