网络抓取实时数据

时间:2020-04-20 07:14:00

标签: python html web-scraping beautifulsoup

我目前正在尝试从yahoo金融页面抓取实时股票市场数据。

我正在使用bs4。我当前的问题是,每当我运行脚本时,它都无法正确更新以反映股票的当前价格。

如果有人对如何更改有任何建议,将不胜感激。

import requests
from bs4 import BeautifulSoup

while True:
    page = requests.get("https://nz.finance.yahoo.com/quote/NZDUSD=X?p=NZDUSD=X")
    soup = BeautifulSoup(page.text, "html.parser")
    price = soup.find("div", {"class": "My(6px) Pos(r) smartphone_Mt(6px)"}).find("span").text
    print(price)

1 个答案:

答案 0 :(得分:0)

不能单独使用BS4

该网站特别使用JavaScript更新页面和urlib等。仅解析页面的html内容而不是Java Script或AJAX内容。 PhantomJs或Selenium Web浏览器提供了一种更加机械化的浏览器,该浏览器通常可以运行JavaScript代码以启用动态网站。尝试使用此:)

使用硒可以通过以下方式完成:

    from selenium import webdriver   #its the library
        import time
        from selenium.webdriver.common.keys import Keys
        from bs4 import BeautifulSoup as soup

            #it Says that we are going to Use chrome browser
        chrome_options = webdriver.ChromeOptions()
            #hiding the Chrome Browser
        chrome_options.add_argument("--headless")

    #Initiating Chrome with all properties we need (in this case we use no specific properties
        driver = webdriver.Chrome(chrome_options=chrome_options,executable_path='C:/Users/shary/Downloads/chromedriver.exe')
    #URL We need to open
        url = 'https://nz.finance.yahoo.com/quote/NZDUSD=X?p=NZDUSD=X'

    #Starting Our Browser
        driver = webdriver.Chrome()
    #Accessing the url .. this will open the page just as you open in Chrome etc.
        driver.get(url)

        while 1:
    #it will get you the html content repeatedly .. So you can get the changing price
            html = driver.page_source
            page_soup = soup(html,features="lxml")
            price = page_soup.find("div", {"class": "D(ib) Mend(20px)"}).text
            print(price)
            time.sleep(5)

请注意最佳评论,但希望您能理解它:)否则,请观看youtube教程以正确了解硒机器人的作用

enter image description here

希望这会有所帮助。它对我来说非常完美:)如果对您有帮助,请接受此答案