如何使用Selenium和PhantomJS从动态网站中提取值

时间:2018-07-29 07:05:53

标签: javascript selenium selenium-webdriver web-scraping phantomjs

我正在尝试获取计时器的值> http://prntscr.com/kcbwd8 在此网站上> https://www.whenisthenextsteamsale.com/ 并希望将其存储在变量中。

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})

for item in result:
    print(item.text)

browser.quit()

我尝试使用上面的代码,但返回此错误>

  

C:\ Users \ rober \ Anaconda3 \ lib \ site-packages \ selenium \ webdriver \ phantomjs \ webdriver.py:49:   UserWarning:已不再支持Shannium对PhantomJS的支持,   请改用无头版本的Chrome或Firefox
  warnings.warn('PhantomJS的硒支持已被弃用,   请使用无头'19:59:11

有什么办法可以解决这个问题?如果没有,那么还有另一种方法来获取网站的动态值并将其存储在变量中。

谢谢。

2 个答案:

答案 0 :(得分:1)

PhantomJs不再被维护。 https://blog.keras.io/building-autoencoders-in-keras.html

您应该使用无头铬/ Firefox。

您将必须替换以下代码:

def genTups(N1:BigInt,N2:BigInt) ={
    def sqt(n:BigInt):BigInt = {
        var a = BigInt(1)
        var b = (n>>5)+BigInt(8)
        while((b-a) >= 0) {
         var mid:BigInt = (a+b)>>1
         if(mid*mid-n> 0) b = mid-1
         else a = mid+1
        }; a-1 }
      val x = for(s<- sqt(N1) to sqt(N2)) yield s*s;
      val y = x.combinations(2).map{ case Vector(a,b) => (a,b)}.toList
      y.filter(t=> (t._1*30/t._2)>=1)
  }

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

在此处下载Geckodriver:https://groups.google.com/forum/m/#!topic/phantomjs/9aI5d-LDuNE

答案 1 :(得分:1)

您的代码是完美的。尽管您没有使用定义为的标头

sinon.spy

我已经执行了自己的脚本,如下所示:

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

我确实在控制台上看到与以下相同的输出:

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
    print(item.text)
browser.quit()

值得一提的是, Selenium 团队已经在 Selenium Java Client 中删除了对 PhantomJS 的默认支持,并将遵循相同的规则使用 Selenium Python Client 。您正在观察的警告 PhantomJS C:\Python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' 08:06:16 方法的一部分,如下所示:

__init__()