Question

I am trying to crawl a website (with python) and get its users info. But when I download the source of the pages, it is different from what I see in inspect element in chrome. I googled and it seems I should use selenium, but I don't know how to use it. This is the code I have and when I see the driver.page_source it is still the source page as in chrome and doesn't look like the source in inspect element. I really appreciate if someone can help me to fix this.

`

import os
from selenium import webdriver

chromedriver = "/Users/adam/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://www.tudiabetes.org/forum/users/Bug74/activity")
driver.quit()

`

Answer 1

It's called XHR.
Your page was loaded from another call, (your url only loads the strcuture of the page, and the meat of the page comes from a different source using XHR, json formatted string) not the pageload it self.

You should really consider using requests and bs4 to query this page instead.

Html code in inspect element differs from html source code

1 个答案: