刮擦JavaScript呈现的页面

时间:2018-01-02 11:17:30

标签: javascript python-3.x selenium selenium-webdriver

我想在Python3中使用Selenium Web驱动程序从Javascript呈现的页面中提取一些数据。我尝试了几个驱动程序,如Firefox,Chromedriver和PhantomJS,但总是得到相同的结果。我只获得了脚本,而不是DOM元素。

以下是我的代码片段

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018-01-09'
driver = webdriver.Chrome("/var/chromedriver/chromedriver")
driver.implicitly_wait(20)
driver.get(url)

print(driver.page_source)

我在这里想念一下吗?

1 个答案:

答案 0 :(得分:1)

我在代码块中没有看到任何此类问题。我按照以下方式尝试了您自己的脚本:

from selenium import webdriver

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018-01-09'
driver = webdriver.Chrome()
driver.get(url)
print(driver.page_source)

我得到以下控制台输出:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
  <meta name="deals::gwt:property" content="baseUrl=/flights/explore//static/" />
  <title>Explore flights</title>
  <meta name="description" content="Explore flights" />
  <script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.yoTdpQipo6s.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/am=AAE/rs=AHpOoo9_VhuRoUovwpPPf5LqLZd-dmCnxw/cb=gapi.loaded_0" async=""></script>
  <script language="javascript" type="text/javascript">
    var __JS_ILT__ = new Date();
    .
    .
    . <
    /div></div > < div aria - hidden = "true"
    style = "display: none;" > < div class = "CTPFVNB-l-j CTPFVNB-l-h" > Displayed currencies may differ from the currencies used to purchase flights.– < a href = "https://www.google.com/intl/en/googlefinance/disclaimer/"
    class = "CTPFVNB-l-k" > Disclaimer < /a></div > < /div><div aria-hidden="true" style="display: none;"><div class="CTPFVNB-l-j CTPFVNB-l-h">Showing licensed rail data. – <a href="https:/ / www.google.com / intl / en / help / legalnotices_maps.html " class="
    CTPFVNB - l - k ">Legal Notice</a></div></div><div class="
    CTPFVNB - l - i "><a class="
    CTPFVNB - l - k CTPFVNB - l - j " href="
    https: //www.google.com/intl/en/policies/">Privacy &amp; Terms</a><a class="CTPFVNB-l-k CTPFVNB-l-j" href="https://support.google.com/flights/?hl=en">Help Center</a></div></div></div><iframe id="deals" tabindex="-1" style="position: absolute; width: 0px; height: 0px; border: none; left: -1000px; top: -1000px;">
</iframe><input type="text" id="_bgInput" style="display:none;" /></body></html>

现在,您可以清楚地看到 page_source 的fag末尾有一个 iframe 。除非我们切换到 iframe ,否则您将无法找到您正在寻找的 DOM element