Question

我正在尝试从该网站提取两个值：

bizportal.co.il

一个值是右边的美元汇率，左边是下降/上升的百分比。

问题是，在我获得美元汇率值后，出于某种原因四舍五入了数字。（您可以在终端中看到）。我想获得网站上显示的确切数字。

是否有一些友好的Python网页抓取文档？

P.S：在VS中运行代码时，如何摆脱弹出的Python终端窗口？我只希望输出将在VS中-在交互式窗口中。

my_url = "https://www.bizportal.co.il/forex/quote/generalview/22212222" 
                 
uClient = urlopen(my_url) 
                                                        
page_html = uClient.read()  

uClient.close()                                                                      

page_soup = BeautifulSoup(page_html, "html.parser")                                 

div_class = page_soup.findAll("div",{"class":"data-row"})                      

print (div_class)
#print(div_class[0].text)
#print(div_class[1].text)

Answer 1

数据是通过Ajax动态加载的，但是您可以使用requests模块来模拟此请求：

import json
import requests

url = 'https://www.bizportal.co.il/forex/quote/generalview/22212222'
ajax_url = "https://www.bizportal.co.il/forex/quote/AjaxRequests/DailyDeals_Ajax?paperId={paperId}&take=20&skip=0&page=1&pageSize=20"
paper_id = url.rsplit('/')[-1]
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}

data = requests.get(ajax_url.format(paperId=paper_id), headers=headers).json()

# uncomment this to print all data:
#print(json.dumps(data, indent=4))

# print first one
print(data['Data'][0]['rate'], data['Data'][0]['PrecentageRateChange'])

打印：

3.4823 -0.76%

Answer 2

问题在于该元素正在使用Javascript动态更新。您将无法使用urllib或请求抓取“最新”值。加载页面后，它会填充一个最近值（可能是从数据库中填充），然后通过Javascript将其替换为实时数字。

在这种情况下，最好使用Selenium之类的东西来加载网页-这允许javascript在页面上执行，然后抓取数字。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument("--headless") # allows you to scrape page without opening the browser window
driver = webdriver.Chrome('./chromedriver', options=options)

driver.get("https://www.bizportal.co.il/forex/quote/generalview/22212222")
time.sleep(1) # put in to allow JS time to load, sometimes works without.
values = driver.find_elements_by_class_name('num')
price = values[0].get_attribute("innerHTML")
change = values[1].find_element_by_css_selector("span").get_attribute("innerHTML")

print(price, "\n", change)

输出：

╰─$ python selenium_scrape.py
3.483 
 -0.74%

您应该熟悉Selenium，了解如何设置和运行它-包括安装浏览器（在这种情况下，我使用的是Chrome，但您可以使用其他浏览器），了解从何处获取浏览器驱动程序（（在这种情况下为Chromedriver），并了解如何解析页面。您可以在这里https://www.selenium.dev/documentation/en/

了解所有信息

用Python抓取网络-从网站中提取价值

2 个答案: