如何获取下一页的页面来源

时间:2018-12-27 15:31:25

标签: python selenium-webdriver beautifulsoup webdriver

我想做的就是将驱动程序转换为html以便使用漂亮的汤。问题是,由美化工具(也就是驱动程序中的那个)打印的项目是登录页面的html,而不是随后的页面(我确定登录成功,以及导航到下一页。

驱动程序是否有原因包含第一页的源代码,而不更新到我们导航到的页面?

这是我的代码:

import os
import random
import sys

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://www.instagram.com/gelsonfonteles/followers/'
driver = webdriver.Chrome()
driver.implicitly_wait(1)
driver.get(url)


username = driver.find_element_by_xpath('//*[@name="username"]')
password = driver.find_element_by_xpath('//*[@name="password"]')
login_btn = driver.find_element_by_xpath('//*[@class="_0mzm- sqdOP  L3NKy      "]')

username.send_keys("name")
password.send_keys("pass")

#login
login_btn.click()
driver.implicitly_wait(2)

soup = BeautifulSoup(driver.page_source,features="lxml")
print(soup.prettify())

driver.quit()

2 个答案:

答案 0 :(得分:3)

driver.implicitly_wait(2)在这种情况下是没有用的。您需要使用explicit wait。例如

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

login_btn.click()
WebDriverWait(driver, 10).until(EC.url_changes('https://www.instagram.com/accounts/login/?next=/gelsonfonteles/followers/')) #  pass exact URL of Login page
soup = BeautifulSoup(driver.page_source,features="lxml")

EC.url_changes允许等待指定的URL更改其他内容。

您还可以等待某些特定元素显示在所需页面上

答案 1 :(得分:1)

您非常接近。您只需要为页面上任何元素的可见性引诱 WebDriverWait 即可使用features="html.parser",如下所示:

  • 代码块:

    # -*- coding: UTF-8 -*-
    from bs4 import BeautifulSoup
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    url = 'https://www.instagram.com/gelsonfonteles/followers/'
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get(url)
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='username']"))).send_keys("username")
    driver.find_element_by_css_selector("input[name='password']").send_keys("password")
    driver.find_element_by_xpath("//button[normalize-space()='Log in']").click()
    WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//h1[text()='gelsonfonteles']")))
    soup = BeautifulSoup(driver.page_source,features="html.parser")
    print(soup.prettify())
    driver.quit()
    
  • 控制台输出:

    <!DOCTYPE html>
    <html class="js logged-in client-root" lang="en" xmlns="http://www.w3.org/1999/xhtml">
     <head>
      <meta charset="utf-8"/>
      <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
      <title>
       Gelson Fonteles ???? (@gelsonfonteles) • Instagram photos and videos
      </title>
      <meta content="noimageindex, noarchive" name="robots"/>
      <meta content="yes" name="mobile-web-app-capable"/>
      <meta content="#000000" name="theme-color"/>
      <meta content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, viewport-fit=cover" id="viewport" name="viewport"/>
      <link href="/data/manifest.json" rel="manifest"/>
      <link crossorigin="" href="https://graph.instagram.com" rel="preconnect"/>
      <link as="script" crossorigin="anonymous" href="/static/bundles/metro/ProfilePageContainer.js/68f09467caf1.js" rel="preload" type="text/javascript"/>
      <script async="" src="https://connect.facebook.net/signals/config/1425767024389221?v=2.8.35&amp;r=stable">
      </script>
      <script async="" src="//connect.facebook.net/en_US/fbevents.js">
      </script>
      <script id="facebook-jssdk" src="https://connect.facebook.net/en_US/sdk.js">
      </script>
      <script type="text/javascript">
       (function() {
      var docElement = document.documentElement;
      var classRE = new RegExp('(^|\\s)no-js(\\s|$)');
      var className = docElement.className;
      docElement.className = className.replace(classRE, '$1js$2');
    })();
      </script>
      <script type="text/javascript">
       /*
     Copyright 2018 Google Inc. All Rights Reserved.
     Licensed under the Apache License, Version 2.0 (the "License");
     you may not use this file except in compliance with the License.
     You may obtain a copy of the License at
    
         http://www.apache.org/licenses/LICENSE-2.0
    
     Unless required by applicable law or agreed to in writing, software
     distributed under the License is distributed on an "AS IS" BASIS,
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     See the License for the specific language governing permissions and
     limitations under the License.
    */
    
    (function(){function g(a,c){b||(b=a,f=c,h.forEach(function(a){removeEventListener(a,l,e)}),m())}function m(){b&amp;&amp;f&amp;&amp;0&lt;d.length&amp;&amp;(d.forEach(function(a){a(b,f)}),d=[])}function n(a,c){function k(){g(a,c);d()}function b(){d()}function d(){removeEventListener("pointerup",k,e);removeEventListener("pointercancel",b,e)}addEventListener("pointerup",k,e);addEventListener("pointercancel",b,e)}function l(a){if(a.cancelable){var c=performance.now(),b=a.timeStamp;b&gt;c&amp;&amp;(c=+new Date);c-=b;"pointerdown"==a.type?n(c,
    a):g(c,a)}}var e={passive:!0,capture:!0},h=["click","mousedown","keydown","touchstart","pointerdown"],b,f,d=[];h.forEach(function(a){addEventListener(a,l,e)});window.perfMetrics=window.perfMetrics||{};window.perfMetrics.onFirstInputDelay=function(a){d.push(a);m()}})();
      </script>
      <script type="text/javascript">
       (function() {
      if ('PerformanceObserver' in window &amp;&amp; 'PerformancePaintTiming' in window) {
        window.__bufferedPerformance = [];
        var ob = new PerformanceObserver(function(e) {
          window.__bufferedPerformance.push.apply(window.__bufferedPerformance,e.getEntries());
        });
        ob.observe({entryTypes:['paint']});
      }
      window.__bufferedErrors = [];
      window.onerror = function(message, url, line, column, error) {
        window.__bufferedErrors.push({
          message: message,
          url: url,
          line: line,
          column: column,
          error: error
        });
        return false;
      };
      window.__initialData = {
        pending: true,
        waiting: []
      };
      function notifyLoaded(item, data) {
        item.pending = false;
        item.data = data;
        for (var i = 0;i &lt; item.waiting.length; ++i) {
          item.waiting[i].resolve(item.data);
        }
        item.waiting = [];
      }
      function notifyError(item, msg) {
        item.pending = false;
        item.error = new Error(msg);
        for (var i = 0;i &lt; item.waiting.length; ++i) {
          item.waiting[i].reject(item.error);
        }
        item.waiting = [];
      }
      window.__initialDataLoaded = function(initialData) {
        notifyLoaded(window.__initialData, initialData);
      };
      window.__initialDataError = function(msg) {
        notifyError(window.__initialData, msg);
      };
      window.__additionalData = {};
      window.__pendingAdditionalData = function(paths) {
        for (var i = 0;i &lt; paths.length; ++i) {
          window.__additionalData[paths[i]] = {
        pending: true,
        waiting: []
          };
        }
      };
      window.__additionalDataLoaded = function(path, data) {
        if (path in window.__additionalData) {
          notifyLoaded(window.__additionalData[path], data);
        } else {
          console.error('Unexpected additional data loaded "' + path + '"');
        }
      };
      window.__additionalDataError = function(path, msg) {
        if (path in window.__additionalData) {
          notifyError(window.__additionalData[path], msg);
        } else {
          console.error('Unexpected additional data encountered an error "' + path + '": ' + msg);
        }
      };
    })();
      </script>
      <link href="/static/images/ico/apple-touch-icon-76x76-precomposed.png/4272e394f5ad.png" rel="apple-touch-icon-precomposed" sizes="76x76"/>
      <link href="/static/images/ico/apple-touch-icon-120x120-precomposed.png/02ba5abf9861.png" rel="apple-touch-icon-precomposed" sizes="120x120"/>
      <link href="/static/images/ico/apple-touch-icon-152x152-precomposed.png/419a6f9c7454.png" rel="apple-touch-icon-precomposed" sizes="152x152"/>
      <link href="/static/images/ico/apple-touch-icon-167x167-precomposed.png/a24e58112f06.png" rel="apple-touch-icon-precomposed" sizes="167x167"/>
      <link href="/static/images/ico/apple-touch-icon-180x180-precomposed.png/85a358fb3b7d.png" rel="apple-touch-icon-precomposed" sizes="180x180"/>
      <link href="/static/images/ico/favicon-192.png/68d99ba29cc8.png" rel="icon" sizes="192x192"/>
      <link color="#262626" href="/static/images/ico/favicon.svg/fc72dd4bfde8.svg" rel="mask-icon"/>
      <link href="/static/images/ico/favicon.ico/36b3ee2d91ed.ico" rel="shortcut icon" type="image/x-icon"/>
      <link href="android-app://com.instagram.android/https/instagram.com/_u/gelsonfonteles/" rel="alternate"/>
      <meta content="Instagram" property="al:ios:app_name"/>
      <meta content="389801252" property="al:ios:app_store_id"/>
      <meta content="instagram://user?username=gelsonfonteles" property="al:ios:url"/>
      <meta content="Instagram" property="al:android:app_name"/>
      <meta content="com.instagram.android" property="al:android:package"/>
      <meta content="https://www.instagram.com/_u/gelsonfonteles/" property="al:android:url"/>
      <link href="https://www.instagram.com/gelsonfonteles/" rel="canonical"/>
      <meta content="94.2k Followers, 323 Following, 620 Posts - See Instagram photos and videos from Gelson Fonteles ???? (@gelsonfonteles)" name="description"/>
      <meta content="profile" property="og:type"/>
      <meta content="https://scontent-sin6-2.cdninstagram.com/vp/44c2bf3c9657d797afd661cd7026e189/5C9C5435/t51.2885-19/s150x150/46263173_2475614175787091_1415254353245110272_n.jpg?_nc_ht=scontent-sin6-2.cdninstagram.com" property="og:image"/>
      <meta content="Gelson Fonteles ???? (@gelsonfonteles) • Instagram photos and videos" property="og:title"/>
      <meta content="94.2k Followers, 323 Following, 620 Posts - See Instagram photos and videos from Gelson Fonteles ???? (@gelsonfonteles)" property="og:description"/>
      <meta content="https://www.instagram.com/gelsonfonteles/" property="og:url"/>
      <script type="application/ld+json">
       {"@context":"http:\/\/schema.org","@type":"Person","name":"Gelson Fonteles \ud83d\udd8b\ud83d\udd04","alternateName":"@gelsonfonteles","description":"Fortaleza - CE , 23 anos!\nENCOMENDAS : Whats App: (85) 99760-7606","url":"http:\/\/www.facebook.com\/gelson.fonteles","mainEntityofPage":{"@type":"ProfilePage","@id":"https:\/\/www.instagram.com\/gelsonfonteles\/","interactionStatistic":{"@type":"InteractionCounter","interactionType":"http:\/\/schema.org\/FollowAction","userInteractionCount":"94237"}},"image":"https:\/\/www.instagram.com\/static\/images\/ico\/favicon-200.png\/ab6eff595bb1.png","email":"gelsonfontelesart@gmail.com"}
      </script>
      <link href="https://www.instagram.com/gelsonfonteles/" hreflang="x-default" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=en" hreflang="en" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=fr" hreflang="fr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=it" hreflang="it" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=de" hreflang="de" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es" hreflang="es" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=zh-cn" hreflang="zh-cn" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=zh-tw" hreflang="zh-tw" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ja" hreflang="ja" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ko" hreflang="ko" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=pt" hreflang="pt" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=pt-br" hreflang="pt-br" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=af" hreflang="af" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=cs" hreflang="cs" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=da" hreflang="da" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=el" hreflang="el" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=fi" hreflang="fi" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=hr" hreflang="hr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=hu" hreflang="hu" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=id" hreflang="id" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ms" hreflang="ms" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=nb" hreflang="nb" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=nl" hreflang="nl" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=pl" hreflang="pl" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ru" hreflang="ru" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=sk" hreflang="sk" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=sv" hreflang="sv" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=th" hreflang="th" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=tl" hreflang="tl" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=tr" hreflang="tr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=hi" hreflang="hi" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=bn" hreflang="bn" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=gu" hreflang="gu" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=kn" hreflang="kn" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ml" hreflang="ml" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=mr" hreflang="mr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=pa" hreflang="pa" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ta" hreflang="ta" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=te" hreflang="te" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ne" hreflang="ne" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=si" hreflang="si" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ur" hreflang="ur" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=vi" hreflang="vi" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=bg" hreflang="bg" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=fr-ca" hreflang="fr-ca" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=ro" hreflang="ro" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=sr" hreflang="sr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=uk" hreflang="uk" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=zh-hk" hreflang="zh-hk" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-uy" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-gt" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-pe" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-cl" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-ar" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-mx" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-bo" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-cu" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-pa" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-ve" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-do" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-co" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-pr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-cr" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-ec" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-ni" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-hn" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-sv" rel="alternate"/>
      <link href="https://www.instagram.com/gelsonfonteles/?hl=es-la" hreflang="es-py" rel="alternate"/>