我想抓取Facebook的mbasic.facebook.com界面。它具有加载更多按钮以向下滚动到新帖子。我对Facebook的常规界面抓取功能进行了大量研究,发现了这一点 Scraping infinite scrolling website with Selenium in Python
import unittest, time, re
class Sel(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Chrome()
self.driver.implicitly_wait(30)
self.verificationErrors = []
self.accept_next_alert = True
def test_sel(self):
driver = self.driver
delay = 3
driver.get("https://www.facebook.com")
elem = driver.find_element_by_name("email")
elem.clear()
elem.send_keys("")
elem2 = driver.find_element_by_name("pass")
elem2.clear()
elem2.send_keys("")
elem2.send_keys(Keys.RETURN)
for i in range(1,100):
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(4)
html_source = driver.page_source
data = html_source.encode('utf-8')
print(data)
if __name__ == "__main__":
unittest.main()
但是我不想循环,而是想触发一个事件,例如,如果用户手动按下“加载更多帖子”按钮,则将加载新页面,并获得该页面的页面来源。有什么办法吗?任何帮助将不胜感激。
答案 0 :(得分:1)
那么,您是否希望每次加载更多帖子时都获得页面源?因为该代码无法反映这一点。假设每次加载新帖子列表时都需要源代码,则可以使用XPath找到并单击“更多帖子”按钮。
a = ["A", "a"]
b = ["B", "b"]
c = ["C", "c"]
number_iterator = zip(a,b,c)
numbers = list(number_iterator)
print (numbers)