我有一个返回URL列表标题的代码。我想用几种方法充实它。
这是代码:
from pyvirtualdisplay import Display
from time import sleep
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
display = Display(visible=0, size(800,600))
display.start()
urls = ["https://google.com", "https://youtube.com"]
driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
for url in urls:
try:
driver.get(url)
print(driver.title)
except TimeoutException as e:
print("Timeout")
driver.quit()
有了这个,我想做以下事情。首先,我不希望像这样获取网址列表,而希望从.txt中获取它们。 然后,我也想要它,以便当它检查单个URL时,它等待其标题从“正在加载...”转到其他内容,然后打印更改后的内容。为此,我已经尝试过:
while driver.title == 'Loading...':
pass
print(driver.title)
这里的问题是,有时标题永远不会从“正在加载...”更改,因此程序将永远停留在该位置。我想拥有它,以便在10秒钟后仍未更改的情况下,在打印“标题未加载”之后,它将转到列表中的下一个URL。
我还要补充一点,我不确定该怎么做。用“ print(driver.title)”打印标题。我想在标题后面添加一个数字(“ print(driver.title),“ number”)。 这个数字背后的原因是要知道到目前为止已经经过了多少个URL,但是它不是从1开始。它会以一个较大的数字开始,例如50。这意味着在第5个URL上应该是“网址标题,55。”我该怎么办?
谢谢。
答案 0 :(得分:1)
如果标题未更改,则能够在10秒后超时,我可以为您提供与Java一起使用的功能。我知道您正在使用python,但这是我必须显示的内容。您应该能够切换出适当的python语法
def timeExpired = false
def timeoutPeriod = new TimeDuration(0, 0, 10, 0)
def timeStart = new Date()
def titleFound = false
def title
while(!titleFound && !timeExpired){ //While title is not found AND time has not expired run loop
try{
title = driver.title
titleFound = title != "Loading..."
if(!titleFound){ //No need to check if time expired if title has been found, so only checking if it hasn't
timeExpired = TimeCategory.minus(new Date(), timeStart) > timeoutPeriod
if(timeExpired){
title = "Title didn't load"
}
}
}
catch(Exception e){
//Handle the exception
}
}
print(title)
要从文本文件中输入URL,请用逗号分隔URL并读取内容:
text_file = open("filename.txt", "r")
lines = text_file.read().split(',')
我没有启动并运行python来确认这是正确的,但是您可以循环浏览各行并以这种方式传递url并像已经做的那样浏览驱动程序。
最后,要在打印输出中添加一个计数器,您只需要设置一个计数器变量,然后再开始遍历URL即可以所需的任意数字开始。
counter = 50
然后在循环中,每次将增加1:
counter += 1
要将其添加到打印输出中,可以执行以下操作:
print(title + " " + str(counter))
语法可能并不完美,但应该接近。
答案 1 :(得分:1)
这是更新的脚本,其中包含您的要求。
from pyvirtualdisplay import Display
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
from datetime import datetime
# this method will check the driver title after the specified interval seconds for a given max time in seconds
def wait_until_browser_loaded(interval, maxTime):
start_time = datetime.now()
elements = []
while (datetime.now() - start_time).seconds < maxTime:
time.sleep(interval)
if driver.title != 'Loading...':
return
display = Display(visible=0, size(800,600))
display.start()
# open and readlines from external input file
urlsFile = open("urls_file_path_goes_here", "r")
urls = urlsFile.readlines() # use this if you want to enter urls in different lines
#urls = urlsFile.read().split(",") # use this if you want to enter comma separated urls.
driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
titleAppendNumber = 50
for url in urls:
try:
driver.get(url)
title = driver.title
if title == "Loading...":
wait_until_browser_loaded(5, 10)
if title == "Loading...":
print ("Title Load" + " - " + str(titleAppendNumber))
else:
print (title + " - "+ str(titleAppendNumber))
titleAppendNumber +=1
except TimeoutException as e:
print("Timeout")
driver.quit()