使用Selenium从动态网页中抓取内容会返回错误的内容

时间:2020-07-05 11:52:21

标签: selenium

我正在尝试打印https://www.dplay.no/kanaler/的HTML代码(该网页受地理位置限制,因此您可能必须使用https://go.discovery.com/tv-shows/),但这没关系。

由于该网页使用JavaScript加载HTML内容,所以我决定将Selenium与Python 3配合使用以抓取内容。

到目前为止,我有:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.dplay.no/kanaler')

html = driver.page_source

print(html)

我也尝试过:

html = driver.execute_script("return document.documentElement.outerHTML;")

html = driver.execute_script("return document.documentElement.innerHTML;")

但是,这似乎不起作用,因为我得到的响应不是网页上的HTML。

如何获取网页上实际可见的HTML内容?

1 个答案:

答案 0 :(得分:0)

您看到的是正确的输出和正确的行为。

我接受了您的代码,并添加了一些选项以及一些等待,这是观察到的情况:

  • 代码块:

    from selenium import webdriver
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.dplay.no/kanaler/')
    time.sleep(10)
    print(driver.page_source)
    
  • 控制台输出:

      <html lang="no"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width,maximum-scale=10,minimum-scale=1,initial-scale=1"><meta name="google" value="notranslate"><title>Strøm kanaler direkte | Dplay</title><link rel="preconnect" href="https://dplay-static.disco-api.com"><link rel="preconnect" href="https://disco-api.dplay.no"><link rel="preconnect" href="https://eu1-prod-images.disco-api.com"><link rel="preconnect" href="https://connect.facebook.net"><link rel="preconnect" href="https://fonts.googleapis.com"><link rel="preconnect" href="https://assets.adobedtm.com"><link rel="preload" as="script" href="/main-1adbd0ca3d3a7141c1a5.js"><meta name="mobile-web-app-capable" content="yes"><link rel="manifest" href="/manifest.json" crossorigin="use-credentials"><link rel="icon" href="/dplay-logo-180.png"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-title" content="Dplay"><meta name="apple-mobile-web-app-status-bar-style" content="white"><link rel="apple-touch-icon" href="/dplay-apple-touch-icon.jpg"><link rel="apple-touch-startup-image" href="/dplay-logo-text-180x75.png"><!-- Facebook App link --><meta property="al:ios:url" content="com.discovery.dplay://facebook"><meta property="al:ios:app_store_id" content="KC4ZD2359Y.com.kanal5.play"><meta property="al:ios:app_name" content="Dplay"><meta property="al:android:url" content="com.discovery.dplay://facebook"><meta property="al:android:package" content="no.dplay"><meta property="al:android:app_name" content="Dplay"><script type="text/javascript" async="" src="https://www.googleadservices.com/pagead/conversion_async.js"></script><script type="text/javascript" async="" src="https://www.googleadservices.com/pagead/conversion_async.js"></script><script src="https://secure.quantserve.com/quant.js" async="" type="text/javascript"></script>
      .
      <script src="https://assets.adobedtm.com/479fbb05b9cf/9fc1a3ab6d1b/76543fb834e9/RCea880b60a90b4cb88872a3ecb52c59e0-source.min.js" async=""></script><script src="https://assets.adobedtm.com/479fbb05b9cf/9fc1a3ab6d1b/76543fb834e9/RC5b307908f85d452bbd1cc58e00201436-source.min.js" async=""></script></head><body><div id="app"><div class="pageContainer-1eCorB4H"><div id="header-wrapper" class="sticky-1FwWG4lU"><header class="header-1l1ildAB"><div class="topHeader-zyhEIsC-"><div class="topContainer-21wWp6Os"><a class="link-_ruDcDB7 logoLink-318yvghE" href="/"><img alt="Dplay" class="logo-3IfpM36Y logo-h00c9h56" src="/a08ed345c0fe04696cf31ab3b87100dc.svg"></a><div class="navWrapper-vwKHbhW_"><div class="nav-10tSiGaY"><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/programmer"><div class="navItem-14yB0BB8">Programmer</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/kanaler"><div class="navItem-14yB0BB8">Kanaler</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/tv-guide"><div class="navItem-14yB0BB8">TV-guide</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/sport"><div class="navItem-14yB0BB8">Sport</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/kategorier"><div class="navItem-14yB0BB8">Kategorier</div></a><a class="link-_ruDcDB7 item-2iwAUPE8 navItem-3wTHBCrm favouritesEnabled-3VQzQJHh" href="/gratis"><div class="navItem-14yB0BB8">Gratis</div></a></div><div class="premiumWrapper-3DTdcxSl"><a class="premiumButton-31dbB505" href="/mydplay/products?configName=auth-prod&amp;hostUrl=disco-api.dplay.no&amp;realm=dplayno&amp;returnUrl=https%3A%2F%2Fwww.dplay.no%2Fkanaler%2F" target="_self">Registrer</a></div></div><div class="iconWrapper-3mBB7-5x"><a class="link-ear3kCaw" href="/mydplay/entry/login?configName=auth-prod&amp;hostUrl=disco-api.dplay.no&amp;realm=dplayno&amp;returnUrl=https%3A%2F%2Fwww.dplay.no%2Fkanaler%2F" target="_self"><div class="container-2M8eCiLJ favouritesEnabled-3pfkgJ2m"><span class="label-2g_F1Qvf">Logg inn</span><span class="SVGInline icon-1tqhFCqf icon-hn3OCBQP" style="font-size: 0px;"><svg class="SVGInline-svg icon-1tqhFCqf-svg icon-hn3OCBQP-svg" viewBox="0 0 28 28" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><title>ic_icon_login_default</title><desc>Created with Sketch.</desc><g id="ic_icon_login_default" stroke="none" stroke-width="1"><g id="Login"><rect id="Rectangle" fill="#D8D8D8" opacity="0" x="0" y="0" width="28" height="28"></rect><g id="Group" transform="translate(3.192000, 3.024000)"><path d="M10.7907276,10.976 C5.48646358,10.976 1.06106358,14.738528 0.0376635838,19.740224 C-0.196360416,20.884192 0.690455584,21.952 1.85816758,21.952 L19.7230636,21.952 C20.9033756,21.952 21.7773676,20.865488 21.5375196,19.70976 C20.5024716,14.72324 16.0841836,10.976 10.7907276,10.976 M10.7907276,13.776 C11.7565596,13.776 12.7005516,13.941984 13.5966076,14.269416 C14.4628716,14.585872 15.2652956,15.045296 15.9817036,15.634864 C17.1155916,16.568104 17.9765916,17.79008 18.4745996,19.152 L3.10685558,19.152 C3.60351958,17.793552 4.46115958,16.574544 5.59112758,15.641976 C6.30820758,15.050224 7.11175158,14.589064 7.97947158,14.27132 C8.87709558,13.942656 9.82299158,13.776 10.7907276,13.776" id="Fill-1"></path><circle id="Oval" fill-rule="nonzero" cx="10.808" cy="4.816" r="4.816"></circle></g></g></g></svg></span></div></a>
      .
      <div class="text-1Ey12L6b"><p class="paragraph-3wtxxPuR size2-34rTNEs0">Dplay bruker cookies på nettsiden for å huske dine innstillinger, lage statistikker for å forbedre nettsiden vår, og å gi deg de mest relevante annonsene. Denne informasjonen kan deles med tredjeparter. Ved å fortsette å bruke nettsiden aksepterer du vår bruk av cookies, men du kan når som helst endre denne godkjenningen ved å følge instruksene på vår <a class="" href="https://dplay.no/cookies" rel="noopener" target="_blank">Cookies-side</a>. Her kan du også lese mer om dette</p></div></div><div class="links-2-4rTI9u"></div><button class="button-b4wYudld round-1Ew9jgjq default-vjGITl8z tertiaryCTA-3nF7cF3Z button-2j5j5ldl" type="button"><div class="content-2CZAzoNK"><p class="paragraph-3wtxxPuR text-2iB55dam size3-3bK_JR3k">Ok, jeg aksepterer</p></div></button></div></div><noscript></noscript><noscript></noscript><noscript></noscript><noscript></noscript></dialog><div class="footer-2i64orTD"><footer class="footer-OP_eHgMZ"><div class="container-1KS4F4y4"><div class="base-1JDWzsKS divider-1J9xjEr7"></div><div class="links-3cRELxmJ"><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/brukervilkaar">Brukervilkår</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/personvernpolicy">Personvernpolicy</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="" href="https://dplayhelp.zendesk.com/hc/no" rel="noopener" target="_blank">Kundeservice</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/om-dplay">Om Dplay</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/cookies">Cookies</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="link-_ruDcDB7" href="/systemkrav">Systemkrav</a></p></div><div class="linkAligner-2mmWPhvh"><p class="paragraph-3wtxxPuR paragraph-jt9VMa_X size1-Aclz5TEc"><a class="" href="https://presse.discovery.no/" rel="noopener" target="_blank">Presse</a></p></div></div><div class="base-1JDWzsKS divider-1J9xjEr7"></div><div class="logos-2tROKQvT"><a class="link-_ruDcDB7" href="/kanaler/tvnorge"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-28-11261681250457276.png?w=108" class="logo-1DS_OQCW" alt="TVNorge"></div></a><a class="link-_ruDcDB7" href="/kanaler/fem"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-29-11262316210706002.png?w=108" class="logo-1DS_OQCW" alt="FEM"></div></a><a class="link-_ruDcDB7" href="/kanaler/max"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-30-11262268785616804.png?w=108" class="logo-1DS_OQCW" alt="MAX"></div></a><a class="link-_ruDcDB7" href="/kanaler/vox"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/16/channel-31-11261733016544693.png?w=108" class="logo-1DS_OQCW" alt="VOX"></div></a><a class="link-_ruDcDB7" href="/kanaler/discovery"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2019/10/08/channel-45-314717396207329.png?w=108" class="logo-1DS_OQCW" alt="Discovery"></div></a><a class="link-_ruDcDB7" href="/kanaler/animal-planet"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2019/01/22/channel-35-17020156064294169.PNG?w=108" class="logo-1DS_OQCW" alt="Animal Planet"></div></a><a class="link-_ruDcDB7" href="/kanaler/tlc"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/19/channel-15-4230971263537569.png?w=108" class="logo-1DS_OQCW" alt="TLC"></div></a><a class="link-_ruDcDB7" href="/kanaler/id"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2018/11/19/channel-73-4230992926516029.png?w=108" class="logo-1DS_OQCW" alt="Investigation Discovery"></div></a><a class="link-_ruDcDB7" href="/kanaler/discovery-science"><div class="logoAligner-3Lo3l93o"><img src="https://eu1-prod-images.disco-api.com/2019/10/08/channel-71-314744145281602.png?w=108" class="logo-1DS_OQCW" alt="Discovery Science"></div></a></div><section class="AppStoreLogosWrapper"><div class="base-1JDWzsKS divider-1J9xjEr7"></div></section><div class="copyrightContainer-2T6iDmRy"><p class="paragraph-3wtxxPuR copyright-2F2sRiJ4 size4-V7KSEEpz uppercase-IgQ1hyw0">Copyright © 2019 Discovery, Inc. or its subsidiaries and affiliates. All rights reserved.</p><a class="discoveryLogo-2PuZiJgQ" href="https://corporate.discovery.com/" rel="noopener" target="_blank"><img alt="Dplay" class="logo-3IfpM36Y" src="https://eu1-prod-images.disco-api.com/2019/3/26/35fc368d-4fb8-4c39-84a8-62eb61a8aeff.png"></a></div></div></footer></div></div></div><script>_satellite["__runScript1"](function(event, target) {
    
      try {
    
      var _hj_country_ids = {
        se : "767702",
        no : "767794",
        dk : "767799",
        fi : "1018217",
        jp : "1749918",
        nl : "1749920"
      }
      var _hj_ctry = /([a-z]{2})$/.exec(document.location.host)[0];
    
      if (_hj_country_ids.hasOwnProperty(_hj_ctry)){
        (function(h,o,t,j,a,r){
          h.hj=h.hj||function(){(h.hj.q=h.hj.q||[]).push(arguments)};
          h._hjSettings={hjid:_hj_country_ids[_hj_ctry],hjsv:6};
          a=o.getElementsByTagName('head')[0];
          r=o.createElement('script');r.async=1;
          r.src=t+h._hjSettings.hjid+j+h._hjSettings.hjsv;
          a.appendChild(r);
          })(window,document,'https://static.hotjar.com/c/hotjar-','.js?sv=');
      }
    
    
        } catch (e) {}
    
      });</script><script>_satellite["__runScript2"](function(event, target) {
      try{
    
      if(/no/i.test(_satellite.getVar("Environment:CountryCode"))){
      (function(win, doc, sdk_url){
        if(win.snaptr) return;
        var tr=win.snaptr=function(){
        tr.handleRequest? tr.handleRequest.apply(tr, arguments):tr.queue.push(arguments);
      };
        tr.queue = [];
        var s='script';
        var new_script_section=doc.createElement(s);
        new_script_section.async=!0;
        new_script_section.src=sdk_url;
        var insert_pos=doc.getElementsByTagName(s)[0];
        insert_pos.parentNode.insertBefore(new_script_section, insert_pos);
      })(window, document, 'https://sc-static.net/scevent.min.js');
       snaptr('init','d3df95e4-c2a5-49f3-91ea-1b91fb1a53af')
      }
    
      } catch (e) {}
      });</script><script>_satellite["__runScript3"](function(event, target) {
      try {
        window.dataLayer = window.dataLayer || [];
        window.gtag = function() {
              dataLayer.push(arguments);
          }
          var country_id = {
          no: "UA-57600485-7",
          dk: "UA-57600485-4",
          se: "DC-8313372",
          fi: "AW-797670288",
          jp: "AW-714777410"
          }
          //This should be reworked and generalized, not all pages have the countrycode as top level domain, added else on line 24 please refactor (KN 2019-08-01)
          var pos = document.location.hostname.split(".").length - 1;
          var cc = document.location.hostname.split(".")[pos];
          if (country_id.hasOwnProperty(cc)) {
            if (!document.getElementById('google-analytics-gtag-js')) {
          var script = document.createElement('script');
          script.src = "https://www.googletagmanager.com/gtag/js?id="+country_id[cc];
          script.async = true;
          script.id = "google-analytics-gtag-js"
          document.head.appendChild(script);
          }
          }
          else {
            if (country_id.hasOwnProperty(_satellite.getVar("Environment:CountryCode"))) {
          if (!document.getElementById('google-analytics-gtag-js')) {
            var script = document.createElement('script');
            script.src = "https://www.googletagmanager.com/gtag/js?id="+country_id[_satellite.getVar("Environment:CountryCode")];
            script.async = true;
            script.id = "google-analytics-gtag-js"
            document.head.appendChild(script);
          }
            }
          }
      } catch (e) {}
    
      /////////////////////MSA Nordics Google organic 20200602
      try{
          var cc = _satellite.getVar("Environment:CountryCode")
          if (/no|dk|se|fi/i.test(cc)){
    
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
    
          gtag('config', 'DC-9232428', {
          'dc_natural_search': {
          'exclusion_parameters': ['gclid\x3d*'],
    
                  'engines': {
                  'yahoo': '468297265;273992205;x',
                  'google': '468296951;273980697;k',
                  'aol': '468307456;273972811;s',
                  'ask': '468306601;273972808;p',
                  'msn': '468291560;273653897;a'
                  }
    
          }
    
          })
      }
      } catch (e) {}
      });</script><script>_satellite["__runScript4"](function(event, target) {
      //// Script load
    
      if (!document.getElementById("userreport-launcher-script")) {
        var script = document.createElement("script");
       script.id = "userreport-launcher-script";
        script.src = "https://sak.userreport.com/discovery/launcher.js";
        script.async = true;
        document.head.appendChild(script);
      }
      });</script><iframe sandbox="allow-scripts allow-same-origin" title="Adobe ID Syncing iFrame" id="destination_publishing_iframe_discovery_0" name="destination_publishing_iframe_discovery_0_name" src="https://discovery.demdex.net/dest5.html?d_nsid=0#https%3A%2F%2Fwww.dplay.no" class="aamIframeLoaded" style="display: none; width: 0px; height: 0px;"></iframe></body></html>
    

结论

该网站基于JavaScript,因此您需要等待WebElementDOM Tree中呈现,然后再收集page_source


参考文献

您可以在以下位置找到一些相关的讨论