我正在尝试制作一个网络抓取工具,以获取每个Google搜索的所有结果,但是它始终输出“错误之前未看到的网络元素引用”。我认为这是由于代码试图在加载URL之前找到该元素,但是我不太确定如何解决它。
from selenium import webdriver
#number of pages
max_page = 5
#number of digits (ie: 2 is 1 digit, 10 is 2 digits)
max_dig = 1
#Open up firefox browser
driver = webdriver.Firefox()
#inputs search into google
question = input("\n What would you like to google today, but replace every space with a '+' (ie: search+this)\n\n")
search = []
#get multiple pages
for i in range(0, max_page + 1):
#inserts page number into google search
page_num = (max_dig - len(str(i))) * "0" + str(i)
#inserts search input and cycles through pages
url = "https://www.google.com/search?q="+ question +"&ei=LV-uXYrpNoj0rAGC8KSYCg&start="+ page_num +"0&sa=N&ved=0ahUKEwjKs8ie367lAhUIOisKHQI4CaM4ChDy0wMIiQE&biw=1356&bih=946"
#finds element in every search page
search+=(driver.find_elements_by_class_name('LC20lb'))
driver.get(url)
#print results
search_items = len(search)
for a in range(search_items):
#print the page number
print(type(search[a].text))
Traceback (most recent call last):
File "screwdriver.py", line 32, in <module>
print(type(search[b].text))
selenium.common.exceptions.NoSuchElementException: Message: Web element reference not seen before: 6187cf00-39c8-c14b-a2de-b1d24e965b65
答案 0 :(得分:1)
问题是Selenium
不保留您找到的HTML,而是引用当前页面上的元素。当您加载新页面-get()
-然后参考尝试在新页面上查找元素,但找不到该元素。在加载新页面之前,您应该从项目中获取文本(以及其他信息)。
from selenium import webdriver
max_page = 5
driver = webdriver.Firefox()
question = input("\n What would you like to google today, but replace every space with a '+' (ie: search+this)\n\n")
search = []
for i in range(max_page+1):
page_num = str(i)
url = "https://www.google.com/search?q="+ question +"&ei=LV-uXYrpNoj0rAGC8KSYCg&start="+ page_num +"0&sa=N&ved=0ahUKEwjKs8ie367lAhUIOisKHQI4CaM4ChDy0wMIiQE&biw=1356&bih=946"
items = driver.find_elements_by_class_name('LC20lb')
for item in items:
search.append(item.text)
driver.get(url)
for item in search:
print(item)