在python selenium中使用chrome插件下载PDF

时间:2018-04-12 00:37:41

标签: python selenium pdf

我试图从使用原生Google Chrome pdf查看器工具的this site中提取PDF,以便首先打开pdf,其内容类型为/application/pdf 。问题是,我获得的网站网址实际上并未链接到PDF,而是链接到.zul网站,其中js将加载pdf或获取它。

以下是我的下载代码:

def download_pdf(url, idx, save_dir):

    options = webdriver.ChromeOptions()
    profile = {"plugins.plugins_list": [{"enabled":False,"name":"Chrome PDF Viewer"}],
        "download.default_directory" : save_dir}
    options.add_experimental_option("prefs",profile)
    driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options)
    driver.get(url)

我遇到上述代码的问题是我从driver.source_page获得以下读数:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Pragma" content="no-cache" />
        <meta http-equiv="Expires" content="-1" />
        <title>Document Viewer</title>
        <link rel="stylesheet" type="text/css" href="/eSMARTContracts/zkau/web/9776a7f0/zul/css/zk.wcs;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1"/>
        <script type="text/javascript" src="/eSMARTContracts/zkau/web/9776a7f0/js/zk.wpd;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1" charset="UTF-8">
        </script>
        <script type="text/javascript" src="/eSMARTContracts/zkau/web/9776a7f0/js/zul.lang.wpd;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1" charset="UTF-8">
        </script>
        <!-- ZK 6.0.2 EE 2012072410 -->
    </head>
    <body>
        <div id="j4AP_" class="z-temp"></div>
        <script class="z-runonce" type="text/javascript">zk.pi=1;zkmx(
        [0,'j4AP_',{dt:'z_2m1',cu:'/eSMARTContracts;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1',uu:'/eSMARTContracts/zkau;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1',ru:'/service/dpsweb/ViewDPSWeb.zul'},[
        ['zul.wnd.Window','j4AP0',{$$onSize:false,$$onMaximize:false,$$onOpen:false,$$onMinimize:false,$$onZIndex:false,$onClose:true,$$onMove:false,width:'100%',height:'100%',prolog:'\
        '},[]]]]);
        </script>
        <noscript>
        <div class="noscript"><p>Sorry, JavaScript must be enabled.<br/>Change your browser options, then <a href="">try again</a>.</p></div>
        </noscript>
    </body>
</html>

编辑:包含链接

0 个答案:

没有答案