我正在抓取这个网页以查找滚动后加载用户的用户名
网页页面:" http://www.quora.com/Kevin-Rose/followers"
我知道页面上的用户数(在这种情况下,编号为43812) 如何滚动页面直到所有用户都加载? 我在互联网上搜索过相同的内容,无论在哪里,我都有相同的代码行,这是:
driver.execute_script(" window.scrollTo(0,)")
如何确定垂直位置以确保加载所有用户?是否还有其他选项可以在不实际滚动的情况下实现相同的目标?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import urllib
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
time.sleep(10)
wait = WebDriverWait(driver, 10)
form = driver.find_element_by_class_name('regular_login')
time.sleep(10)
#add explicit wait
username = form.find_element_by_name('email')
time.sleep(10)
#add explicit wait
username.send_keys('abc@gmail.com')
time.sleep(30)
#add explicit wait
password = form.find_element_by_name('password')
time.sleep(30)
#add explicit wait
password.send_keys('def')
#add explicit wait
password.send_keys(Keys.RETURN)
time.sleep(30)
#search = driver.find_element_by_name('search_input')
search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']")))
search.clear()
search.send_keys('Kevin Rose')
search.send_keys(Keys.RETURN)
link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose")))
link.click()
#Wait till the element is loaded (Asynchronusly loaded webpage)
handle = driver.window_handles
driver.switch_to.window(handle[1])
#switch to new window
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers")))
element.click()
答案 0 :(得分:4)
由于在最后一个关注者加载后没有什么特别的东西出现,我依赖的事实是你知道用户拥有多少关注者,并且你知道每次向下滚动加载了多少关注者(我' ve检查 - 每卷18个。因此,您可以计算滚动页面所需的次数。
以下是实施(我使用了只有53位粉丝的不同用户来演示解决方案):
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
followers_per_page = 18
driver = webdriver.Chrome() # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")
# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count
# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
driver.execute_script("window.scrollTo(0, 10000);")
time.sleep(2)
此外,如果有大量关注者,您可能需要根据循环变量增加此10000
Y坐标值。