如何在JavaScript图形中抓取数据?

时间:2019-02-12 07:13:26

标签: python

我想从图中抓取数据。 我到达了html源,该源显示了我希望抓取的数字,但从这里不能走得更远。 我想要得到的是数据中的数字:[....]

import urllib.request
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from html.parser import HTML parser

urlpage = 'https://peak.energy.mn/chart.php'
browser = webdriver.Firefox()
browser.get(urlpage)
innerHTML = browser.execute_script ('return document.body.innerHTML')

<canvas height="399" id="myChart" style="display: block; width: 798px; height: 399px;" width="798"></canvas>
<script src="js/chart.min.js"></script>
<script type="text/javascript">


   var ctx = document.getElementById('myChart').getContext('2d');
   var chart = new Chart(ctx, {

    // The type of chart we want to create
    type: 'line',

    // The data for our dataset
    data: {
        labels: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24],

        datasets: [
         {
            label: "Горим төлөвлөлт",

            fill: false,
            backgroundColor: 'rgb(255, 87, 51)',
            borderColor: 'rgb(255, 87, 51)',

           // pointHitRadius: 50,
            data:["818","789","764","756","755","758","771","813","864","927","962","967","957","947","929","926","929","985","1054","1037","1010","971","926","885"],

        },

        {
            label: "Гүйцэтэл",

            fill: true,
            backgroundColor: 'rgb(25,204,199)',
            borderColor: 'rgb(25,204,199)',
            pointHitRadius: 50,
            data:["789.75","760.88","751.72","744.43","740.64","744.84","754.91","798.03","829.95","866.09","886.45","886.69","870.99","858.99"],

        }

]

1 个答案:

答案 0 :(得分:0)

您不需要硒即可从脚本中获取硒。您需要的只是bs4和一些正则表达式来获取所有“数据”对象的出现。

#!/usr/bin/env python3
# coding: utf8
import requests
import re
from bs4 import BeautifulSoup as BfS

if __name__ == "__main__":
    url = 'https://peak.energy.mn/chart.php'
    page = requests.get(url)
    html = BfS(page.text, "html.parser")
    dataregex = re.findall('data:(.*?)]', str(html))
    result = []
    for dr in dataregex:
        r = re.findall('"(.*?)"', dr)
        result.append(r)
    print(result)

结果是多个数据列表的列表:

[
['818', '789', '764', '756', '755', '758', '771', '813', '864', '927', '962', '967', '957', '947', '929', '926', '929', '985', '1054', '1037', '1010', '971', '926', '885'],
['789.75', '760.88', '751.72', '744.43', '740.64', '744.84', '754.91', '798.03', '829.95', '866.09', '886.45', '886.69', '870.99', '858.99', '856.25', '856.71']
]