beautifulsoup刮取实时值

时间:2020-07-06 11:45:59

标签: python beautifulsoup

我正在尝试抓取个人项目的货币汇率,我使用CSS选择器来获取值所在的类。网站上有一个提供这些值的JavaScript,看来我对开发人员控制台不太了解,我检查了一下,在网络部分看不到任何实时运行的内容。这是我编写的代码,到目前为止,它带出了很多破折号。令人惊讶的是,破折号与应该显示速率的那些部分的源代码匹配。

from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.ig.com/en/forex/markets-forex")
soup = BeautifulSoup(r.content, "html.parser")
results = soup.findAll("span",attrs={"data-field": "CPT"})
for span in results:
    print(span.text)

1 个答案:

答案 0 :(得分:0)

通过JS填充跨元素,动态值。在开始时,每个跨度元素都包含“-”。 您需要使用js驱动程序来等待元素填充,然后再从span中获取值。

含硒:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')

for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
    print(elm, elm.text)

chromedriver从https://sites.google.com/a/chromium.org/chromedriver/home下载

另外,dryscrape + bs4,但干擦似乎已过时。示例here

已修改:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')

time.sleep(2) # Maybe more or less, how much faster page load

for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
    if elm.text:
        print(elm, elm.text)

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')

data = []
while not data:
    for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
        if elm.text and elm.text != '-': # Maybe check on contains digit
            data.append(elm.text)
    time.sleep(1)
print(data)