目前,我在selenium
中使用python
来做一些需要永不停止循环来监控我想要的内容,这里是代码片段:
records = set()
fileHandle = open('d:/seizeFloorRec.txt', 'a')
fileHandle.write('\ncur time: '+time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))+'\n')
driver = webdriver.Chrome()
while(True):
try:
print "time: ", time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
subUrls = aMethod(driver) # a irrelevant function which returns a list
time.sleep(2)
for i in range(0, len(subUrls)):
print "cur_idx=["+str(i)+"], max_cnt=["+str(len(subUrls))+"]"
try:
rtn = monitorFloorKeyword(subUrls[i])
time.sleep(1.5)
if(rtn[0] == True):
if(rtn[1] not in records):
print "hit!"
records.add(rtn[1])
fileHandle.write(rtn[1]+'\t'+rtn[2].encode('utf-8')+'\n')
fileHandle.flush()
else:
print "hit but not write."
except Exception as e:
print "exception when get page: ", subUrls[i]
print e.__doc__
continue
print "sleep 5*60 sec..."
time.sleep(300) # PROBLEM LIES HERE!!!
print "sleep completes."
except Exception as e:
print 'exception!'
print e.__doc__
time.sleep(20)
它总是在time.sleep(300)
卡住不可预测,输出“睡眠5 * 60秒......”但没有“睡眠完成”。
有人能给我一些可能的原因吗?非常感谢!
已更新
我在这里找到了similar problem,但实际上并没有得到他想说的意思。希望它能解决我的问题。
最新测试
由于使用chromedriver
,我在每个函数的每个返回行之前添加了driver.get("about:blank")
,如下所示,以强制停止当前页面的异步页面加载。并且此强制停止操作导致 ERROR ipc_channel_win.cc(370)]管道错误:109 有时不会影响我的程序运行。这会影响我的time.sleep
功能吗?
def retrieveCurHomePageAllSubjectUrls(driver):
uri = "http://www.example.com/main.php?page=1"
driver.get(uri)
element = driver.find_elements_by_class_name('subject')
subUrls = []
for i in range(0, len(element)):
subUrls.append(element[i].get_attribute('href').encode('utf-8'))
driver.get("about:blank") #This is what I add
return subUrls
def monitorFloorKeyword(subUrl):
driver.get(subUrl)
title = driver.find_element_by_id('subject_tpc').text
content = driver.find_element_by_id('read_tpc').text
if(title.find(u'keyword') >= 0 or content.find(u'keyword') >= 0):
driver.get("about:blank") #This is what I add
return (True,subUrl,title,content)
driver.get("about:blank") #This is what I add
return (False,)
认为最终
正如我上面所述,有时候我driver.get("about:blank")
之后会出现管道错误,但好消息是这次一切正常。如果有人知道与此帖有关的selenium
,请通知我,我真的很赞赏。
答案 0 :(得分:0)
我花时间简化和清理代码。
previously_seen_sub_urls = set()
with open('d:/seizeFloorRec.txt', 'a') as outfile:
outfile.write(
'\ncur time: ' +
time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) +
'\n')
driver = webdriver.Chrome()
while True:
try:
print "time: ", time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time()))
sub_urls = aMethod(driver) # an irrelevant function which returns a list
time.sleep(2) # Why sleep here?
print "max_cnt=[%d]" % len(sub_urls)
for i, sub_url in enumerate(sub_urls):
print "cur_idx=[%s]" % i
try:
rtn = monitorFloorKeyword(sub_urls[i])
# rtn is either a length 1 tuple, first value False
# or a length 4 tuple, (True, sub_url, title, content)
time.sleep(1.5)
if rtn[0]:
if rtn[1] not in previously_seen_sub_urls:
print "hit!"
previously_seen_sub_urls.add(rtn[1])
outfile.write(rtn[1]+'\t'+rtn[2].encode('utf-8')+'\n')
outfile.flush()
else:
print "hit but not write."
except Exception as e: # Should catch specific subclass of Exception
print "exception when get page: ", sub_urls[i]
print e
# Continues
print "sleep 5*60 sec..."
time.sleep(300) # PROBLEM POSSIBLY DOESN'T LIE HERE!!!
print "sleep completes."
except Exception as e: # Should catch specific subclass of Exception
print 'exception!'
print e
time.sleep(20)
# Continues
我没有发现问题,但我怀疑你的异常处理程序。
使用异常处理程序时,除了非常有限的情况(例如在代码的外部循环中),最好避免“除异常”,因为它表明您不知道异常(或至少是异常的子类)你期望获得,所以不清楚你采取的行动是否正确。
第二个问题是您不打印异常,但是打印异常的doc字符串。对于Python的内置异常,这些字符串可能很有用,但不保证为自定义异常设置它们。您可能会发现没有显示异常。
这并不能解释您的问题,但我有兴趣看看是否更改它直接打印异常,而不是e.__doc__
会有所帮助。 (另请参阅traceback
模块以了解有关异常来源的更多信息。)
答案 1 :(得分:0)
所以摆脱time.sleep
并尝试使用implicitly_wait
ff = webdriver.Firefox()
ff.implicitly_wait(30)
或尝试使用WebDriverWait
ff = webdriver.Firefox()
ff.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(ff, 10).until(EC.presence_of_element_located((By.ID, "myDynamicElement")))
finally:
ff.quit()