在相册页面(https://pinterest.com/user/album/)内,您可以看到排列的元素。我正在使用具有默认大小窗口的Firefox Webdriver。
所有元素都具有相同的类(class =“ YlMIw Hb7”),它们一次加载约20个。我正在做的是迭代每个元素,进入他的视图页面,获取数据,返回主页等等。
当到达数组的最后一个元素时,它向下滚动到该元素,并且迭代器再次从第一个元素开始。这种方法有两个问题:
我几乎在每一行中都发表了评论,但是如果您有任何疑问,请询问。我很乐意收到一些建议甚至是实现主要目标的新策略,并从页面中获取所有数据。预先感谢!
在Selenium WebDriver中使用Python 3.7。
def getData():
i = 0
j = 0
while True:
time.sleep(1.5)
elems = driver.find_elements_by_class_name("Yl-") # Load array of elements in page
#see if we are in limit
if len(elems) == i:
#driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") Old way: Go to bottom of the page
driver.execute_script("arguments[0].scrollIntoView();", elems[i]) # New way: Go to the last element.
time.sleep(2) # Wait new elements to load
elems = driver.find_elements_by_class_name("Yl-") # Reload array
i = 0 # Reset
elems[i].click() # Go to element view page
time.sleep(0.75)
try:
elems = driver.find_elements_by_class_name("hCL") # Array with some itens, the image we want is the third one
except NoSuchElementException:
print("...")
imgsrc = elems[2].get_attribute('src') # Image source link
imgsrc2 = imgsrc.replace("564x", "originals", 1) # Get better quality image
try:
urllib.request.urlretrieve(imgsrc2, "files/file" + str(j) + ".jpg")
except:
urllib.request.urlretrieve(imgsrc, "files/file" + str(j) + ".jpg") # If better qiality image isn't avaliable, get the normal one
pass
try:
elems = driver.find_elements_by_class_name("gUZ") # Array with some itens, the back button is the third last
except NoSuchElementException:
print("Can't find back button...")
elems = driver.find_elements_by_class_name("gUZ")
elems[-3].click() # Go back to album page
i = i + 1
j = j + 1