基于python的webscraper可以获取Java脚本功能的结果吗?

时间:2019-12-14 08:47:58

标签: javascript python jquery html web-scraping

我正在尝试创建一个基于python的网络刮板,以从https://www.jmbullion.com/charts/gold-price/获取黄金价格。但是,当我运行代码时,它会返回我要查找的跨度,但其为空。 。我检查了该网站,似乎正在运行一些Java脚本来替换值“ jQuery(“#oz_display”)。html(“ $” + gold_oz.toString()。replace(/(\ d)(?=(\ d \ d \ d)+(?!\ d))/ g,“ $ 1,”))“如何获取此数据?

import re
from bs4 import BeautifulSoup
from urllib.request import urlopen

my_url = "https://www.jmbullion.com/charts/gold-price/"

gold_url = urlopen(my_url)
page_html = gold_url.read()
gold_url.close()

page_soup = BeautifulSoup(page_html, "html.parser")

containers = page_soup.findAll("td", {"class": "td_2"})
print(containers)
input("end?")```

2 个答案:

答案 0 :(得分:0)

要回答您的问题:是的,有很多方法可以使用python评估javascript。我相信当今人们使用(硒)[https://selenium.dev/]

在您的特殊情况下购买,如果您稍微看一下javascript代码,就会发现它是从ID为 gounce 的div获取值的:

var gold_oz=jQuery("#gounce").html()

因此,您只需要从那里获取价值。在撰写本文时,该值为:

<div id="gounce">1478.12</div>

答案 1 :(得分:0)

这些值是由jQuery在正文中计算和写入的,因此您有两个选择:

  1. 使用硒并让其为您呈现javascript,然后从dom检索所需的数据
  2. 按照jQuery代码并尝试在python中应用相同的逻辑

方法1:

y_inf = np.clip(h/2-wall_pix_h,0,h-1) # y_inf, y_sup and x have all the same length (w)
y_sup = np.clip(h/2+wall_pix_h,0,h-1)
x = np.arange(w)

pixels[x, y_inf:y_sup] = 1

输出:

from selenium import webdriver

driver = webdriver.Chrome()
try:
    driver.get("https://www.jmbullion.com/charts/gold-price/")
    gold_value = driver.find_elements_by_id('oz_display')
    if gold_value:
        print('Gold Price Per Ounce ==>'    ,gold_value[0].text)
    gold_per_gram = driver.find_elements_by_id('gr_display')
    if gold_per_gram:
        print('Gold Price Per Gram ==>' ,gold_per_gram[0].text)
    gold_per_kilo = driver.find_elements_by_id('kl_display')
    if gold_per_kilo:
        print('Gold Price Per Kilo ==>' ,gold_per_kilo[0].text)
except Exception as e:
    print(e)
finally:
    if driver is not None : driver.close()

方法2:

Gold Price Per Ounce ==> $ 1,478.12
Gold Price Per Gram ==> $ 47.52
Gold Price Per Kilo ==> $ 47,522.66

输出:

from bs4 import BeautifulSoup
import requests , re

url = "https://www.jmbullion.com/charts/gold-price/"

res = requests.get(url)

page_soup = BeautifulSoup(res.text, "html.parser")
gold_ask_value = page_soup.find("div", {"id": "gounce"}).text

# Gold Price Per Ounce
#var gold_oz = jQuery("#gounce").html();
#This code get's the value of div with id gounce
#jQuery("#oz_display").html("$ " + gold_oz.toString().replace(/(\d)(?=(\d\d\d)+(?!\d))/g, "$1,"));
#This code acts like a formatter . $1 here means the first match wich in this case is 1 then replce this first match with ,
# for example if the value is 5324 then the match will be 5 and that will lead to 5,324 and so on

first_digit = re.search(r"(\d)(?=(\d\d\d)+(?!\d))", gold_ask_value).group(1)
formatted_gold_value = re.sub(r"(\d)(?=(\d\d\d)+(?!\d))",f'$ {first_digit},',gold_ask_value)

# Gold Price Per Gram
# var gold_oz2 = gold_oz.replace(/,/g, "");
#this code remove the formats we did before and return the number without ,
# var gold_gr = Math.round((gold_oz2 / 31.1034768) * 100) / 100;
#This code divid the golden value by 31.1034768 then multiply it by 100 then uses Math.round to round the number to its nearest integer then divid by 100

gold_per_gram = round((float(gold_ask_value) / 31.1034768) * 100) / 100
formatted_gold_per_gram = f'$ {gold_per_gram}' #to make it look like identical to the website 

# var gold_kl = Math.round((gold_oz2 / 0.0311034768) * 100) / 100;
# does the same as gold per gram except the dividing num
# var gold_kl2 = gold_kl.toFixed(2).replace(/\d(?=(\d{3})+\.)/g, '$&,');
# this code acts like the formater before
gold_per_kilo = str(round((float(gold_ask_value) / 0.0311034768) * 100) / 100)
second_digit = re.search(r"\d(?=(\d{3})+\.)", gold_per_kilo).group(0)
gold_per_kilo = re.sub(r"\d(?=(\d{3})+\.)",f'{first_digit},',gold_per_kilo)
formatted_gold_per_kilo = f'$ {gold_per_kilo }' #to make it look like identical to the website 

print('Gold Price Per Ounce ==>'    ,formatted_gold_value)
print('Gold Price Per Gram ==>' ,formatted_gold_per_gram)
print('Gold Price Per Kilo ==>' ,formatted_gold_per_kilo)