我试图从使用原生Google Chrome pdf查看器工具的this site中提取PDF,以便首先打开pdf,其内容类型为/application/pdf
。问题是,我获得的网站网址实际上并未链接到PDF,而是链接到.zul
网站,其中js将加载pdf或获取它。
以下是我的下载代码:
def download_pdf(url, idx, save_dir):
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled":False,"name":"Chrome PDF Viewer"}],
"download.default_directory" : save_dir}
options.add_experimental_option("prefs",profile)
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options)
driver.get(url)
我遇到上述代码的问题是我从driver.source_page
获得以下读数:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="-1" />
<title>Document Viewer</title>
<link rel="stylesheet" type="text/css" href="/eSMARTContracts/zkau/web/9776a7f0/zul/css/zk.wcs;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1"/>
<script type="text/javascript" src="/eSMARTContracts/zkau/web/9776a7f0/js/zk.wpd;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1" charset="UTF-8">
</script>
<script type="text/javascript" src="/eSMARTContracts/zkau/web/9776a7f0/js/zul.lang.wpd;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1" charset="UTF-8">
</script>
<!-- ZK 6.0.2 EE 2012072410 -->
</head>
<body>
<div id="j4AP_" class="z-temp"></div>
<script class="z-runonce" type="text/javascript">zk.pi=1;zkmx(
[0,'j4AP_',{dt:'z_2m1',cu:'/eSMARTContracts;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1',uu:'/eSMARTContracts/zkau;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1',ru:'/service/dpsweb/ViewDPSWeb.zul'},[
['zul.wnd.Window','j4AP0',{$$onSize:false,$$onMaximize:false,$$onOpen:false,$$onMinimize:false,$$onZIndex:false,$onClose:true,$$onMove:false,width:'100%',height:'100%',prolog:'\
'},[]]]]);
</script>
<noscript>
<div class="noscript"><p>Sorry, JavaScript must be enabled.<br/>Change your browser options, then <a href="">try again</a>.</p></div>
</noscript>
</body>
</html>
编辑:包含链接