我正在尝试抓取个人项目的货币汇率,我使用CSS选择器来获取值所在的类。网站上有一个提供这些值的JavaScript,看来我对开发人员控制台不太了解,我检查了一下,在网络部分看不到任何实时运行的内容。这是我编写的代码,到目前为止,它带出了很多破折号。令人惊讶的是,破折号与应该显示速率的那些部分的源代码匹配。
from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.ig.com/en/forex/markets-forex")
soup = BeautifulSoup(r.content, "html.parser")
results = soup.findAll("span",attrs={"data-field": "CPT"})
for span in results:
print(span.text)
答案 0 :(得分:0)
通过JS填充跨元素,动态值。在开始时,每个跨度元素都包含“-”。 您需要使用js驱动程序来等待元素填充,然后再从span中获取值。
含硒:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')
for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
print(elm, elm.text)
chromedriver从https://sites.google.com/a/chromium.org/chromedriver/home下载
另外,dryscrape + bs4,但干擦似乎已过时。示例here
已修改:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')
time.sleep(2) # Maybe more or less, how much faster page load
for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
if elm.text:
print(elm, elm.text)
或
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.ig.com/en/forex/markets-forex')
data = []
while not data:
for elm in driver.find_elements(By.CSS_SELECTOR, "span[data-field=CPT]"):
if elm.text and elm.text != '-': # Maybe check on contains digit
data.append(elm.text)
time.sleep(1)
print(data)