尝试抓取一个约有800页的网站,该代码会随机以随机页号停止!(例如,第一次我在第5页出现错误,然后在下次运行时,在第500页出现错误)< / p>
初始化的硒驱动程序,从网站的页面中提取表格,对于每一行以及该行中的每一列,我正在提取数据并将其添加到单独的列表中,以便以后我可以将它们分组以形成数据框,但是每次我重新运行程序时,上述错误都会使循环以随机的页面编号循环并停止循环。
错误是“ StaleElementReferenceException:消息:{“ errorMessage”:“元素在缓存中不存在”,“ request”:{“ headers”:{“ Accept”:“ application / json”,“ Accept-Encoding” :“身份”,“内容长度”:“ 117”,........“块”:[“元素”]},“ urlOriginal”:“ / session / 7cff3fd0-fd4e-11e9-b0e3- c9f679f964da / element /:wdc:1572684982199 / elements“}} 屏幕截图:可通过屏幕获取”
url = "https://auditoria.cgu.gov.br/"
driver.get(url)
page = driver.find_element_by_id('lista_info').text
last_page=int(re.split(r'\s',page)[-1])
for_heading=driver.find_element_by_id('lista_wrapper').find_elements_by_css_selector('th')
heading_list=[]
for element in for_heading:
heading_list.append(element.text)
df1= pd.DataFrame(columns=heading_list)
listn_0=[]
listn_1=[]
listn_2=[]
listn_3=[]
listn_4=[]
listn_5=[]
listn_6=[]
c=0
page=0
counter=0
page=0
counter=0
while page!=last_page:
for_row = len(driver.find_element_by_id('lista').find_elements_by_tag_name('tr'))
for row in range(1,for_row):
i=0
if(driver.find_element_by_id('lista').find_elements_by_tag_name('tr')[row].find_elements_by_tag_name('td')):
col_list= len(driver.find_element_by_id('lista').find_elements_by_tag_name('tr')[row].find_elements_by_tag_name('td'))
c=0
for cols in range(col_list):
if(c==1):
locals()['listn_%d' %i].append(driver.find_element_by_id('lista').find_elements_by_tag_name('tr')[row].find_elements_by_tag_name('td')[cols].find_element_by_css_selector('a').get_attribute('href'))
else:
locals()['listn_%d' %i].append(driver.find_element_by_id('lista').find_elements_by_tag_name('tr')[row].find_elements_by_tag_name('td')[cols].get_property('text'))
i=i+1
c=c+1
else:
continue;
counter=counter+1
page=page+1
print(counter)
driver.find_element_by_id('lista_next').click()
time.sleep(1)