动态值网页抓取

时间:2018-12-11 08:14:30

标签: python selenium web-crawler

大家好,我一直在尝试通过Web抓取一些页面,这些页面包含随时更改的值,但是到目前为止我还无法获得价格。谁能帮我,这是我到目前为止到达的地方!

import requests
import bs4
from urllib.request import Request, urlopen as uReq
from bs4 import BeautifulSoup as soup 
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

my_url = 'https://www.cryptocompare.com/'
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe')
options = Options()
options.set_headless(headless=True)
options.binary = binary
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_options=options, capabilities=cap, executable_path="C:/Users/Genti/AppData/Local/Programs/Python/Python36-32/Lib/site-packages/selenium/geckodriver.exe")
browser = webdriver.Firefox(firefox_binary=binary)
browser.get(my_url)
html = browser.execute_script("return document.documentElement.outerHTML")

sel_soup = soup(html, 'html.parser')
prices = sel_soup.findAll("td", {"class":"price"})
print(prices)

3 个答案:

答案 0 :(得分:0)

以防万一,如果您要全部10个价格。您必须将所有价格存储在列表中,例如:

all_prices = driver.find_elements_by_css_selector("td[class='price'] div")  

然后循环遍历以获取值:

for price in all_prices:  
  print(price.text)  

如果您遇到任何困难,请告诉我。

答案 1 :(得分:0)

如果要使用BS而不是Selenium Webdriver:

prices = sel_soup.select("td[class^='price'] > div")

答案 2 :(得分:0)

您可以尝试下面的代码来获取货币名称,价格

webpacker server