很抱歉这个问题有一个模糊的标题,但我真的找不到详细说明的方法。所以这就是问题所在。 编辑: browser.get(url)似乎没有做任何事情。这就是我现在所处的环境(uname -a输出): Linux goorm 4.4.0-116-generic#140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86 _64 GNU / Linux
>>> from selenium import webdriver
>>> browser = webdriver.PhantomJS()
>>> browser.get('https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa')
>>> browser.page_source
u'<html><head></head><body></body></html>'
>>> browser.current_url
u'about:blank'
我有点认为它在网络驱动程序上,想要调试它,我怎么知道驱动程序是否无法运行? 如果它不是驱动程序,问题是什么?
答案 0 :(得分:0)
不幸的是我离开了安装了selenium的机器,所以我不能自己测试,但我有几个建议可以尝试。
首先,您可以尝试在browser.get()
调用和脚本的其余部分之间添加sleep语句。我有一些页面,在get调用后动态加载内容需要一些额外的时间。通常我宁愿在selenium中使用隐式或显式等待,但我不知道这些是否适用于browser.page_source
,因为它确实存在,尽管不正确。
其次,您可以尝试使用其他浏览器驱动程序。 Firefox和Chrome都有无头选项,所以你不必处理弹出的可见浏览器(我假设你不想在PhantomJS中选择它们)。
答案 1 :(得分:0)
您没有提到 Selenium Python 客户端版本,以及您正在使用的 PhantonJS exe版本。
使用 Python v3.6.1 , Selenium Python Client v3.10.0 和 phantomjs v2.1.1从我的本地 Windows 8 计算机二进制文件我能够检索以下看起来非常完美的内容:
代码:
from selenium import webdriver
browser = webdriver.PhantomJS(executable_path=r'C:\Utility\phantomjs-2.1.1-windows\bin\phantomjs.exe')
browser.get('https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa')
print(browser.page_source)
print(browser.current_url)
控制台输出:
<!DOCTYPE html><html xmlns:cc="http://creativecommons.org/ns#" class="u-overflowHidden"><head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# medium-com: http://ogp.me/ns/fb/medium-com#"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=contain"><title>How to Scrape Javascript Rendered Websites with Python & Selenium</title><link rel="canonical" href="https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"><meta name="title" content="How to Scrape Javascript Rendered Websites with Python & Selenium"><meta name="referrer" content="always"><meta name="description" content="On my quest to learn, I wanted to eventually be able to write beginner- friendly guides that really help make one feel like they can improve. Normally, we’ll get hit with very long documentations and…"><meta name="theme-color" content="#000000"><meta property="og:title" content="How to Scrape Javascript Rendered Websites with Python & Selenium"><meta property="og:url" content="https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"><meta property="og:image" content="https://cdn-images-1.medium.com/max/1200/1*-lqb_gM0ai9M4YniJRzWyQ.png"><meta property="fb:app_id" content="542599432471018"><meta property="og:description" content="In this guide:"><meta name="twitter:description" content="In this guide:"><meta name="twitter:image:src" content="https://cdn-images-1.medium.com/max/1200/1*-lqb_gM0ai9M4YniJRzWyQ.png"><link rel="publisher" href="https://plus.google.com/103654360130207659246"><link rel="author" href="https://medium.com/@hoppy"><meta property="author" content="Alex Hop"><meta property="og:type" content="article"><meta name="twitter:card" content="summary_large_image"><meta property="article:publisher" content="https://www.facebook.com/medium"><meta property="article:author" content="Alex Hop"><meta name="robots" content="index, follow"><meta property="article:published_time" content="2016-11-11T01:17:38.979Z"><meta name="twitter:site" content="@Medium"><meta property="og:site_name" content="Medium"><meta name="twitter:label1" value="Reading time"><meta name="twitter:data1" value="7 min read"><meta name="twitter:app:name:iphone" content="Medium"><meta name="twitter:app:id:iphone" content="828256236"><meta name="twitter:app:url:iphone" content="medium://p/c137892216aa"><meta property="al:ios:app_name" content="Medium"><meta property="al:ios:app_store_id" content="828256236"><meta property="al:android:package" content="com.medium.reader"><meta property="al:android:app_name" content="Medium"><meta property="al:ios:url" content="medium://p/c137892216aa"><meta property="al:android:url" content="medium://p/c137892216aa"><meta property="al:web:url" content="https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"><link rel="search" type="application/opensearchdescription+xml" title="Medium" href="/osd.xml"><link rel="alternate" href="android-app://com.medium.reader/https/medium.com/p/c137892216aa"><script type="application/ld+json">{"@context":"http://schema.org","@type":"NewsArticle","image":{"@type":"ImageObject","width":1920,"height":481,"url":"https://cdn-images-1.medium.com/max/1920/1*-lqb_gM0ai9M4YniJRzWyQ.png"},"datePublished":"2016-11-11T01:17:38.979Z","dateModified":"2018-03-05T13:55:30.448Z","headline":"How to Scrape Javascript Rendered Websites with Python & Selenium","name":"How to Scrape Javascript Rendered Websites with Python & Selenium","keywords":["Python","Ubuntu","Selenium","Automated Testing","Web Scraping"],"author":{"@type":"Person","name":"Alex Hop","url":"https://medium.com/@hoppy"},"creator":["Alex Hop"],"publisher":{"@type":"Organization","name":"Medium","url":"https://medium.com/","logo":{"@type":"ImageObject","width":308,"height":60,"url":"https://cdn-images-1.medium.com/max/308/1*OMF3fSqH8t4xBJ9-6oZDZw.png"}},"mainEntityOfPage":"https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa"}</script><link rel="stylesheet" href="https://cdn-static-1.medium.com/_/fp/css/main-branding-base.hYiEpYs3x8GgQzREEhW49Q.css"><script>if (window.top !== window.self) window.top.location = window.self.location.href;var OB_startTime = new Date().getTime(); var OB_loadErrors = []; function _onerror(e) { OB_loadErrors.push(e) }; if (document.addEventListener) document.addEventListener("error", _onerror, true); else if (document.attachEvent) document.attachEvent("onerror", _onerror); function _asyncScript(u) {var d = document, f = d.getElementsByTagName("script")[0], s = d.createElement("script"); s.type = "text/javascript"; s.async = true; s.src = u; f.parentNode.insertBefore(s, f);}function _asyncStyles(u) {var d = document, f = d.getElementsByTagName("script")[0], s = d.createElement("link"); s.rel = "stylesheet"; s.href = u; f.parentNode.insertBefore(s, f); return s}(new Image()).src = "/_/stat?event=pixel.load&origin=" + encodeURIComponent(location.origin);</script><script>window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date; ga("create", "UA-24232453-2", "auto", {"allowLinker": true, "legacyCookieDomain": window.location.hostname}); ga("send", "pageview");</script><script async="" src="https://www.google-analytics.com/analytics.js"></script><script>(function () {var height = window.innerHeight || document.documentElement.clientHeight || document.body.clientHeight; var width = window.innerWidth || document.documentElement.clientWidth || document.body.clientWidth; document.write("<style>section.section-image--fullBleed.is-backgrounded {padding-top: " + Math.round(1.1 * height) + "px;}section.section-image--fullScreen.is-backgrounded, section.section-image--coverFade.is-backgrounded {min-height: " + height + "px; padding-top: " + Math.round(0.5 * height) + "px;}.u-sizeViewHeight100 {height: " + height + "px !important;}.u-sizeViewHeight110 {height: " + Math.round(1.1 * height) + "px !important;}.u-sizeViewHeightMin100 {min-height: " + height + "px !important;}.u-sizeViewHeightMax100 {max-height: " + height + "px !important;}section.section-image--coverFade {height: " + height + "px;}.section-aspectRatioViewportPlaceholder, .section-aspectRatioViewportCropPlaceholder {max-height: " + height + "px;}.section-aspectRatioViewportBottomSpacer, .section-aspectRatioViewportBottomPlaceholder {max-height: " + Math.round(0.5 * height) + "px;}.zoomable:before {top: " + (-1 * height) + "px; left: " + (-1 * width) + "px; padding: " + height + "px " + width + "px;}</style>");})()</script><style>section.section-image--fullBleed.is-backgrounded {padding-top: 330px;}section.section-image--fullScreen.is-backgrounded, section.section-image--coverFade.is-backgrounded {min-height: 300px; padding-top: 150px;}.u-sizeViewHeight100 {height: 300px !important;}.u-sizeViewHeight110 {height: 330px !important;}.u-sizeViewHeightMin100 {min-height: 300px !important;}.u-sizeViewHeightMax100 {max-height: 300px !important;}section.section-image--coverFade {height: 300px;}.section-aspectRatioViewportPlaceholder, .section-aspectRatioViewportCropPlaceholder {max-height: 300px;}.section-aspectRatioViewportBottomSpacer, .section-aspectRatioViewportBottomPlaceholder {max-height: 150px;}.zoomable:before {top: -300px; left: -400px; padding: 300px 400px;}</style>
.
.
.
https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa