我有一个使用硒的刮刀。
我有数十万个链接,我的网络抓取工具将设置为打开这些链接并从中提取某些数据。但是,在某些链接上没有数据。在这些情况下,我的网络爬虫正在尝试很长时间才能找到数据,然后放弃并移至下一个。 我希望能够缩短它搜索到下一个迭代之前的时间。 到目前为止,这是我的代码。
for i in links:
try:
driver.get(i)
locater = ('//tr[@data-bid="18"]'+'//span[@class="table-main__detail-odds--hasarchive"]')
pin = driver.find_elements_by_xpath(locater)
match = driver.find_elements_by_xpath('//span[@class="list-breadcrumb__item__in"]')[0].text
date = driver.find_elements_by_xpath('//p[@class="list-details__item__date"]')[0].text
score = driver.find_elements_by_xpath('//p[@class="list-details__item__score"]')[0].text
except:
pass
for i in pin:
try:
i.click()
time.sleep(3)
f = driver.find_elements_by_xpath('//td[@class="bold"]')
d = driver.find_elements_by_xpath('//td[@class="date"]')
with open("t14.csv","a") as r:
r.write("\n")
r.write(match + "," + date + "," + score + ",")
for i in d:
b = i.text
for i in f:
a = i.text
with open("t14.csv","a") as r:
r.write(a + "," + b + ",")
except:
pass