如何使用Selenium Python从脚本标签中抓取内容

时间:2019-01-30 21:05:18

标签: python html selenium-webdriver

我是Selenium的初学者,我的代码需要帮助。我正在尝试从脚本标签中提取数据,该数据告诉我将每种药物悬停在每种药物上时的年度价格。如果我能知道如何将英国的价格表更改为其他选项,那么我也可以从这些图表中提取数据,那将是很棒的。我正在尝试从以下代码中提取内容:

<!-- ========== Block abacus-spend-graph ==========
$(function() {
        ///detect page size to determine if we need to resize
        labelTop = ($(document).width() < 1265) ? 10 : 0;
    $('#spend_graph').css('height', '70px');        // #37828 - reduce to 70

        Highcharts.setOptions({ lang: { numericSymbols: [ 'k' , 'M' , 'B' , 'T' , 'P' , 'E'] } });

        $('#spend_graph').highcharts({
            chart: {
                type: 'bar',
                marginRight: 35,    // #37828
                marginBottom: 5     // #37828
            },
            title: {
                text: '',
                style: { display : 'none' }
            },
            credits: {
                enabled: false
            },  
            legend: {
                y:labelTop,
                enabled: false,
                maxHeight:90,
                itemDistance: 15,
                reversed: true,
                itemStyle:
                {
                    color: '#444',
                    fontWeight: 'normal',
                    padding: '4px'
                }
            },
            xAxis: {
                categories: 
                [
                    'Actual Annual Spending (US Market)', 'Abacus Annual Value (US Market)' // #37828, #58587
                ],
                tickLength: 20,
                labels: {
                    y: 4,
                    style: {
                        fontSize: '12px'
                    }
                }
            },
            yAxis: {
                min: 0,
                title: {
                    enabled: false,     // #37828
                    text: 'Annual Spending'
                },
                gridLineWidth: 0,       // #58587
                stackLabels: {
                    enabled: true,
                    style: {                        
                        color: (Highcharts.theme && Highcharts.theme.textColor) || 'gray'
                    },
                    formatter: function() { return FloatToCurrency(Math.round(this.total/1000000)*1000000, '$,', 'c0'); }, // Round to nearest million
                }
            },
            tooltip: {
                //headerFormat: '<span class="hc-tip-cap">{point.key}</span><table class="hc-tipt">',
                headerFormat: '<span class="hc-tip">',
                footerFormat: '</span>',
                formatter: function() { return '<b>' + this.series.name + '</b>: ' + FloatToCurrency(Math.round(this.y/1000000)*1000000, '$,', 'c0') + '<br />Annual Volume: ' + FloatToString(this.series.options.vol); }, // Round to nearest million
                useHTML: true
            },
            plotOptions: {
                series: {
                    stacking: 'normal',
                    cursor: 'pointer',
                    pointPadding: .07,
                    groupPadding: .07
                }
            },            
            series: 
            [
                { name: 'Velcade', drugNum: 44, stack: 'drugs', data: [{ y: 1262732812.0000, color: '#9b9ea5' }, { y: 242267658.36919075144508670521 }], color: '#db4d72', colorLower: '#db4d72', colorHigher: '#7dbea0', vol: 282238, selected: false },
{ name: 'Pomalyst', drugNum: 32, stack: 'drugs', data: [{ y: 743195055.0000, color: '#bcbec3' }, { y: 52466947.533000000000000 }], color: '#e788a1', colorLower: '#e788a1', colorHigher: '#a8d4c0', vol: 52467, selected: false },
{ name: 'Farydak', drugNum: 14, stack: 'drugs', data: [{ y: 8595625.0000, color: '#dddee1' }, { y: 588196.584924000000000000 }], color: '#f3c3d0', colorLower: '#f3c3d0', colorHigher: '#d3e9df', vol: 809, selected: false }
            ],
            exporting: {
                enabled: false
            }
        });
});



// ============ End abacus-spend-graph ============ -->

我只想提取页面底部附近的药物的y值。但是,我不确定如何找到脚本标记的这一部分

尝试的代码:

url = 
    urllib.urlopen('http://abacus.realendpoints.com/ConsoleTemplate.aspx 
    ?act=qlrd&req=nav&mop=abacus!main&pk=ed5a81ad-9367-41c8-aa6b- 
    18a08199ddcf&ab-eff=1000&ab-tox=0.1&ab-nov=1&ab-rare=1&ab-pop=1&ab- 
    dev=1&ab-prog=1.0&ab-need=1&ab-time=1543102810')
    soup = BeautifulSoup(url.read(), 'lxml')
    data  = soup.find_all("script").string

输出:

UserWarning: The soupsieve package is not installed. CSS selectors cannot be used.
  'The soupsieve package is not installed. CSS selectors cannot be used.'
Traceback (most recent call last):
  File "other.py", line 12, in <module>
    soup = BeautifulSoup(url.read(), 'lxml')
  File "__init__.py", line 196, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

但是,我想尝试使用Selenium Python解决此问题,而不是beautifulsoup。

0 个答案:

没有答案